Validity The agreement between a test score of measure and the quality it is believed to measure, defined sometimes as the answer to the question
What are the three main types of validity evidence? Construct-related, criterion related, content related
What prerequisites exist for validity? Reliability. Cannot infer that a construct exists if cannot by reliably measured
Face Validity The mere appearance that a measure has validity. Test has face validity if the items seem to be reasonable related to the perceived purpose of the test. Does not offer evidence to support conclusions drawn from test scores.
Content Validity Adequacy of representation of the conceptual domain that test is designed to cover. It is the only type of validity that is logical rather than statistical. Ex. Score on a history test should represent your comprehension of history
Construct Underrepresentation relation to content validity Construct underrepresentation describes the failure to capture important components of a construct Ex. Test of math knowledge including algebra but not geometry, validity would be threatened.
Construct-irrelevent variance relation to content validity Occurs when scores are influenced by factors irrelevant to the construct. Ex. A test of intelligence might be influenced by reading comprehension, test anxiety of illness
Content Validity Ratio and calculation CVR=(ne-N/2/N/2) ne= necessary items Panelists rate test items as essential or none
Criterion A criterion is the standard against which the test is compared
Criterion-related Validity How well the test corresponds with a particular criterion. Hight correlations between a test and a particular measure
Three subtypes of criterion related validity Predictive validity Concurrent validity Postdictive validity
Predictive Validity The accuracy with which test scores predict a criterion obtained at a later time Ex. SAT is a predictor and college GPA is criterion
Concurrent Validity The degree to which the test scores are related to the criterion and can be measured at the same time
Postdictive Validity The accuracy with which a test score predicts a previously obtained criterion. Ex. ASPD scalea criminal history
Validity coefficient The relationship between a test and a criterion (correlation usually), the extent to which the test is valid for making statements about the criterion.
Squared validity coefficient The percentage of variation in the criterion we can expect to know in advance because of our knowledge of the test scores. If it is low, using a test may not be worth while
Incremental Validity Adds to the research that is already out there
Construct-related Validity The degree to which a set of measurement operations measures hypothesized constructs: THE MOTHER OF ALL VALIDITIES. Subsumes all activities used in the other types of validity.
Two types of evidence Convergent and discriminant
Convergent construct validity A demonstration of similarity, a measure of same construct converge or narrow in on the same thing. Obtained to show a test measures the same things as other tests and demonstrate specific relationships if the test is doing its job
Discriminant construct validity A demonstration of uniqueness. Low correlations with measures of unrelated constructs, or evidence for what the test does not measure. Also measure does not represent a construct other than its devised purpose
What is the relationship between reliability and validity? Reliable: should limit the magnitude of the validity coefficient. Valid: Should relate to the construct measured by the test "Reliability is necessary, but not sufficient for validity"
Which two types of validity are logistical and not statistical Content Validity ratio and Face Validity
What validity is the mother of all validities? Construct related validity
What roles does the relationship between examiner and test taker play? The examiner has the ability to affect how well or how poorly a test taker performs on a test.
What is the relationship between test examiner and intelligence scores? Race has little effect on IQ scores among white and black children. There are no results that show any true effects
Why would examiner race effects be smaller on IQ tests than on other psychological tests? The effects increase when examiners have more discretion about the test use. If the test is standardized, ethnicity shouldn't matter at all
What is the standard for test takes who are fluent in two languages? They should take the test in the language that they are most fluent in. Interpreters introduce bias
Expectancy effects (Who did it?) The experimenter is likely to find what they are looking for (Rosenthal)
A review of many studies showed that expectancy effects exist in _________ but not all situations Some
What types of situations might require the examiner to deviate from standardized testing procedures? Any situation that may include individuals with handicaps such as blindness, deaf, mentally challenged. "Special Situations"
Advantages of Computer Administered Tests Good standardization, timing, test taker is not rushed, control of bias, cost effective
Disadvantages of Computer Administered Tests Computer generated reports, hard to detect errors, people let the computer do too much thinking, no interpersonal interaction
Subject Variables that impact testing Test Anxiety, Lack of confidence, illness, meds, elderly, stress fatigue, hormonal variation, emotionality
Three major problems in behavioral observations studies Reactivity, Drift, and Expectancies
Reactivity The effect of someone checking work. Reaction to being checked
How does performance change is peopler are not being checked? Outcomes improve when people think they are being checked Reliability and accuracy are highest when someone checks on the observers.
Drift Moving away from standard protocols and adapting personal idiosyncratic ways of administering the test
Drift relation to contrast effect Tendency to rate same behavior differently, even though it occurs in the same context each time
Addressing Drift Observers should be trained and retrained on methods
How well do people detect detection? Not well
Halo Effect Attributing positive attributes independently of actual observed behavior
Traits of a good interviewer Key: Keep interaction going without many questions, opened ended Get it started Social Facilitation- Model those around us Attitude more important than saying right thing Interpersonal influence & interpersonal attraction related
How are interviews similar to tests? Both gather data, reliability, validity for both. Many tests cannot be used w/o interview data
Interpersonal influence Degree one can influence another
Interpersonal attraction Degree to which people share feeling of understanding that person
Statements that should be avoided -Making them uncomfortable - Judgmental/ evaluative statements -Probing statements - Hostile statements - Fake reassurances -Keep interaction flowing
Transitional phrase Phrase that keeps the conversation going after they respond to open-ended question and their response dies down,
If all else fails... -Verbatim playback: repeats interviewee's exact words -Paraphrasing and restatement: repeats response using different words -Summarizing- pulls together the meaning of several responses -Clarification Response- Clarifies the interviewees response
If all fails... Empathy
When should direct questions be used in an interview? -When data cannot be obtained any other way - When time is limited and the interviewer needs specific information - When interviewee can't or won't cooperate
Advantages of structured clinical interviews -Everyone gets the same questions in the same order -Uses specified rules for probing so all interviewees are handled the same They offer reliability but sacrifice flexibility -Frequently used in research
Disadvantages of structured clinical interviews -Requires cooperation -Relies exclusively on the respondent, making the assumptions questionable
What is the purpose of the a mental status examination? Used to evaluate and screen for psychosis, brain damage and other major psychiatric and neurological difficulties
Areas MSE covers Attention, calculation, recall, and language
Rate of interview reliability for structured interviews Twice as high
Major criticism of structured interviews Does not provide as broad a ranged of data
Social Facilitation The tendency for people to behave like the models around them Ex. Comfortable interviewer= comfortable interviewee
Largest source of error in interviews Judging is the largest source of interview error
Three independent research traditions identified by Taylor to study human intelligence Psychometric Information Cognitive
Psychometric approach Oldest approach Examines the elemental structure of a test Examines test properties through evaluating its correlates and underlying dimensions
Information-Procesing Examines the processes that underlie how we learn and solve problems
Cognitive approach Focuses on how humans adapt to reals world demands
Three facilities Binet believed intelligence expressed itself Judgement Attention Reasoning
Two major concepts Age differentiation and general Mental Ability
Age differentiation Differentiating older from younger children by the former's greater capacities
Mental Age Equivalent age capabilities of a child regardless of his/her chronological age. Obtained through age differentiation
General Mental Ability The total product of the various separate and distinct elements of intelligence
Positive Manifold When a set of diverse ability tests are administered to large population samples, the correlations are positive
Percentage of children in a certain age group who could compete a certain tasks 66.67 to 75%
Concept Spearman introduced Intelligence consists of one general factor (g) plus a large number of specific factors
What does the Spearman's concept mean? Approximately half of the variance in a set of diverse mental-ablitly tests is represented in the (g) factor
Statistical method Spearman developed to support his notion of (g) Factor Analysis (reducing a set of variables to a smaller number of factors)
According to the g fluid-g crystalized theory, what are the two basic types of intelligence Crystallized Fluid
Crystallized The knowledge and understanding we have acquired actual learning that has occurred
Fluid The abilities that allow us to reason, think, and acquire new knowledge. Abilities that allow us to learn and acquire new information
How to calculate mental age mental age/ chronological age MA high than CA- Faster than average CA higher than MA- Slower than average
What is the deviation of IQ and how was it used in the Standford-Binet scale? Based on standard score principle (like a z-score), had a mean of 100 and SD of 16, rejected the "intelligence quotient"
Basal A minimum criterion number of correct responses is obtained before moving foward
Ceiling A specified number of incorrect responses (indicates they have reached their peak performance)
Factor that Wechsler focus on that those before him had not Non intellective factors Point Scale Performance Scale (first scale to measure non- verbal intelligence)
Criticisms of the Binet scale By Wechsler Binet started as a children's scale and was not appropriate for adults Binet did not receive credit for tasks complete if certain other criteria were not met- Wechsler thought a points scale was a more appropriate way to measure intelligence
Criticisms of the Binet scale By Wechsler 2 Binet did not account for the deterioration of mental age in older adults Binet focused highly on speed, which handicapped some participants
Age range of the Wechsler scales WAIS III- 13 age groups from 16-17 to 85-89 WISC IV -6 yrs to 16 yrs, 11 months WPPSI III- 4 yrs to 6 yrs
Why is the inclusion of a point scale a significant improvement? The point scale allowed participants to receive credit for each task completed.
What did a performance scale add? The performance scale allowed for participants to "do something." rather than just answer questions, in essence, a measure of nonverbal intelligence
Major functions measured by each subtest of the WAIS-IV- Verbal Vocab- Vocab Knowledge Similarities - Abstract thinking Arithmetic- Concentration Digit Span- Immediate memory, anxiety Information- range of knowledge Comprehension- Judgement Letter Number sequencing- freedom from distractibility
Major functions measured by each subtest of the WAIS-IV- Performance Picture Completion- Attention to details Digit Symbol Coding- Visual motor functioning Block design- non verbal reasoning Matrix reasoning- Inductive reasoning Picture arrangement- Planning ability Symbol search- Info processing speed Object assem
What are the mean, standard deviation, and range for scaled scores, standard scores, and index scores? Scaled scores M=10 SD=3
How are the IQ scores calculated? Take the sum of the raw scores from subtests and compare them to that persons specified age range
Verbal Comprehension A measure of acquired knowledge and verbal reasoning Might be best though of as a measure of crystallized intelligence
Perceptual Organization Believed to be a measure of fluid intelligence Other factors that influence one's performance on this group of tests are attentiveness to details and visual motor integration
Working Memory Refers to the information that we actively hold in our minds in contrast to our stored knowledge, or long-term memory
Processing Speed Attempts to measure how quickly your mind works
What is a hold subtest? Subtest that you will score the sam consitently even if you have mental illness. i.e. illness should not affect these scores
Which subtests are most sensitive to cerebral dysfunction? Block design, matrix reasoning, similarities
Hold subtests? Vocabulary and information
Disadvantages of the alternative intelligence tests when compared to Binet and Wechsler Poor standardization Less stable Limitation in the test manual Not as psychometrically sound IQ scores are not interchangeable with the Binet and Wechsler Scores are more specific
Advantages of alternative intelligence tests when compared to Binet and Wechsler Can be used for specific population sand special purposes - Language limitations - Physical limitations - Culturally deprived people - Foreign-born people Not as reliant on verbal responses Not as reliant on complex visual-motor integration
Advantages of alternative intelligence tests when compared to Binet and Wechsler Useful for screening, supplement, and reevaluations Can be administered non-verbally Less variability because of scholastic achievement
Theme in future intelligence in infant development tests They cannot predict future intelligence
Surveilence Broad observation (test everybody) ex. everybody in elementary school is tested for scoliosis
Screening More direct observation and testing ex. those that test positive for scoliosis during surveillance is then tested more in depth to fid actual diagnosis and level severity
Two infant development test that were discussed Bayley Scales of Infant Development-Second Edition Brazelton Neonatal Assesment Scale (BNAS)
Sensitivity Find true positives
Specificity Avoid false negatives
True positive Sick, test says sick
Fals positive Not sick, test says sick
True negative Not sick, test says not sick
False negative Sick, test
Acceptable sensitivity and specificity levels for developmental screening tests 70-80%
Learning disability currently defined in the school systems Significant difference between IQ and achievement
What was the Woodcock Johnson-III designed For evaluating learning disabilities in educational settings assess: general intellectual abilities (g) specific cognitive abilities and school aptitude, oral language and achievement
Should test scores be used alone to define developmental or learning disabilities? No. Observation and detail scoring of patterns should be observed in natural environments as well.
