click below
click below
Normal Size Small Size show me how
USF - EDF 7437
Advanced Measurement I
Question | Answer |
---|---|
Construct | Hypothetical concepts created by social scientists who attempt to develop theories for explaining human behavior. The construct can never be absolutely confirmed. (Unobserved variable) |
Validity | The degree to which the test actually measures what it claims to measure. Test validity is also the extent to which inferences, conclusions, and decisions made on the basis of test scores are appropriate and meaningful. |
Multitrait-Multimethod Matrix | an approach that is concerned with the adequacy of tests as measures of a construct, rather than the adequacy of a construct. Two or more ways to measure the construct of interest. |
Nomological network | to provide evidence that your measure has construct validity you had to develop a nomological network that would include the theoretical framework for what you are trying to measure, an empirical framework for how you are going to measure it, and specific |
Reliability | the "consistency" or "repeatability" of your measures |
Cronbach's alpha | It is a test for internal consist reliability. It is a mathematically equivalent to the average of all possible split-half estimates |
Random measurement error | caused by any factors that randomly affect measurement of the variable across the sample. For instance, each person's mood can inflate or deflate their performance on any occasion. |
Systematic measurement error | caused by any factors that systematically affect measurement of the variable across the sample. Unlike random error, systematic errors tend to be consistently either positive or negative. |
Bias | external influences that may affect the accuracy of statistical measurements. |
Norm-referenced test score interpretation | type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured. |
Criterion-refernced test score interpretation | one that provides for translating test scores into a statement about the behavior to be expected of a person with that score or their relationship to a specified subject matter. Most tests and quizzes written by school teachers are criterion-referenced |
Standard deviation | a measure of the dispersion of a collection of values. It can apply to a probability distribution, a random variable, a population or a data set. (0 to infinity) |
Pearson product moment correlation | a common measure of the correlation between two variables X and Y. (-1 to 1) (point biserial correlation) |
Covariance of 2 variables | a measure of how much two variables change together. It is a raw measurement of Pearson product moment correlation. (r*sd*sd)- no range limit and difficult relate |
Nominal scale of measurement | Categories without order |
Ordinal scale of measurement | Categories with order |
Interval scale of measurement | Categories with order and equal intervals between categories - Test from 0 to 100 |
Ratio | Categories with order and equal intervals between categories and includes true 0 |
Unidimensional | One theme or factor underlining test. For example, a math test with difficult reading is not unidimensional and therefore a violation |
Exploratory Factor Analysis | a procedure to explain variability among observed variables in terms of fewer unobserved variables called factors. It is exploratory if the researcher has no hypothesis and confirmatory otherwise |
Standard score | z score |
Item difficulty - part of CTT | the mean correct - the lower the number the more difficult the question |
Item variance | p*q where p is # correct and q is 1-p |
Types of Test theories | Classical Test Theory (True-Score Model), Generalizability Theory, and Item Response Theory (Latent Trait Theory) |
Psychometric Theory / Test Theory | Broad term used to tie together a collection of concepts and techniques related to measurement in education, psychology, and related disciplines. |
Spearman | 1904 - Classical Test theory - Factor analysis |
Gullliksen | 1950 - Classical Test Theory |
Lord and Novick | 1968 - Classical Test Theory |
Classical Test Theory (CTT) - equation | Observed Score (X) = True Score - latent (T) + Error - latent and random (E) |
Classical Test Theory - assumptions | 1. Expected value (Error) = 0 and Mean of the measurement errors = 0; 2. Correlation between T and E is 0; 3. Correlation between Error scores is 0 |
Item discrimination | indicates the extent to which success on an item corresponds to success on the whole test |
Karl Pearson | Correlation |
Alfred Binet | Intelligence testing |
E.L. Thorndike | Test theory |
Variance | Between 0 (when all the scores are the same) and infinity and Var(aX+b) = a**2(VarX) |
Deviation score | Determined by score - mean. This value may be negative |
Standard score (or z-score) | A dimensionless quantity derived by subtracting the population mean from an individual raw score and then dividing the difference by the population standard deviation - this is a linear transformation and will not change the distribution |
Cronbach, Gleser, Nanda, and Rajaratnam | Generalizability Theory - 1972 |
Brennan, R. | Generalizability Theory - 1983 |
Shavelson, R and Webb, N.M. | Generalizability Theory - 1991 |
Generalizability Theory | X = T + (E1 + E2 + E3 + ... + Ek) The undifferentiated sources of error in CTT can be modeled as separate sources (e.g., rater, occasion, etc.) - It is the union of CTT and Analysis of Variance - answers questions: how many raters and how often |
Generalizability Theory - applications | 1. Evaluate the multiple sources of error that may operate in a particular measurement context; 2. Use this information to make decisions about how best to improve the reliability of scores |
Facet (Generalizability Theory) | Characteristic of the testing situation, occasion, prompt, rater, item |
Condition (Generalizability Theory) | Level of a facet |
Universe of admissible observations (Generalizability Theory) | Defined by the sets of measurement conditions that are relevant for a particular measurement context |
Thurstone | IRT - 1925 - "A Method of Scaling Psychological and Educational Tests" |
Lazarsfeld | IRT - 1950 - "A Logical and Mathematical Foundation of Latent Structure Analysis" |
Guttman | IRT - 1950 - "The Basis for Scalogram Analysis" |
Lord | IRT - 1952 - "A Theory of Test Scores" |
Birnbaum | IRT - 1957,1958 |
Rasch | IRT - 1961- "On General Laws and the Meaning of Measurement in Psychology" |
Wright | IRT - 1968 - "Sample-free Test Calibration and Person Measurement" |
Samejima | IRT - 1969 - Estimation of Latent Trait Ability Using a Response Pattern of Graded Scores" |
Bock | IRT - 1972 - "Estimating Item Parameters and Latent Ability When Responses are Scored in 2 or More Nominal Categories" |
Andrich | IRT - 1978 - "A Rating Formulation for Ordered Response Categories" |
Wright and Stone | IRT - 1979 - Best Test Design |
Baker | IRT - 1985 - The Basics of Item Response Theory |
Classical Test Theory | 1. Item statistics such as item difficulty and item discrimination are sample dependent 2. Comparisons of students who have taken tests with items of different difficulty are hard to make 3. Does not provide information about how examinees at different |
Uses of Item Response Theory | 1. Test Development; 2. Equating; 3. DIF = Differential Item Functioning; 4. Computer Adaptive Testing |
Item Response Theory Assumptions | 1. Unidimensional trait is being measured 2. Pairs of items are statistically independent after holding constant the latent trait score - Local Independence / Conditional Independence - (e.g., for a given ability level pairs of items are statistically in |
Item Response Theory Models | 1. One-Parameter Logistic: Rasch - Difficulty parameter = b; 2. Two-Parameter Logistic - Difficulty parameter =b and discrimination parameter = a; 3. Three-Parameter Logistic - difficulty parameter=b, discrimination parameter = a, psuedo chance level para |
Log Odds or Logit | If p is probability of passing and q is the probability of failure, then odds is p/q and range from 0 to infinity. Log odds is Ln(odds) = Ln(p/q) and range from -infinity to +infinity |
Content Validity | check the operationalization against the relevant content domain for the construct |
Preditive Validity | Operationalization's ability to predict something it should theoretically be able to predict. |
Convergent Validity | the degree to which the operationalization is similar to (converges on) other operationalizations that it theoretically should be similar to |