Test and Measurement

Help!

Question

Answer

Reliability

The consistency or stability of scores.

🗑

Classical Test Theory

The theory of reliability can be demonstrated with mathematical proofs. X=T+E.

🗑

Random Measurement Error

All measurement is susceptible to error. Any factor that introduces error into the measurement process effects reliability.

🗑

Random measurement error is also referred to as what?

Unsystematic Error

🗑

How do we increase our confidence in scores?

We try to detect, understand, and minimize random measurement error.

🗑

Content Sampling Error

A major source of random measurement error. Results from differences between the sample of items (i.e., the test) and the domain of items (i.e., all items.

🗑

Time Sampling Error/Temporal Instability

Random and transient. Situation-Centered influences (e.g., lighting & noise) and Person-Centered influences (e.g., fatigue, illness)

🗑

Other Sources of Error

Administration errors (e.g., incorrect instructions, inaccurate timing) and Scoring errors (e.g., subjective scoring, clerical errors).

🗑

Classical Test Theory

X = T + E

🗑

Obtained or observed score (fallible)

🗑

True score (reflects stable characteristics)

🗑

Error score (reflects random error)

🗑

Test-Retest Reliability

reflects the temporal stability of a measure. Most applicable with tests administered more than once and/or with constructs that are viewed as stable. It is important to consider length of interval.

🗑

Test-Retest Reliability are subject to what?

“carry-over effects.” Appropriate for tests that are not appreciably impacted by carry-over effects.

🗑

Alternate Form Reliability

Involves the administration of two “parallel” forms.

🗑

Delayed Administration for Alternate Form Reliability

reflects error due to temporal stability and content sampling.

🗑

Simultaneous Administration for Alternate Form Reliability

reflects only error due to content sampling.

🗑

Alternate Form Reliability limitations

Reduces, but may not eliminate carry-over effects. Relatively few tests have alternate forms.

🗑

Internal Consistency Reliability

Estimates of reliability that are based on the relationship between items within a test and are derived from a single administration of a test. Split-Half Reliability

🗑

Split-Half Reliability reflects what type of error?

error due to content sampling.

🗑

Coefficient Alpha & Kuder-Richardson (KR 20)

Reflects error due to content sampling. Sensitive to the heterogeneity of the test content (or item homogeneity). Coefficient Alpha is the mathematical average of what?

🗑

Reliability of Speed Tests

For speed tests, reliability estimates derived from a single administration of a test are inappropriate. For speed tests, test-retest and alternate-form reliability are appropriate, but split-half, Coefficient Alpha and KR 20 should be avoided.

🗑

Inter-Rater Reliability

Reflects differences due to the individuals scoring the test. Important when scoring requires subjective judgement by the scorer.

🗑

Reliability of Difference Scores

Difference scores are calculated when comparing performance on two tests. Why is the reliability of the difference between two test scores generally lower than the reliabilities of the two tests?

🗑

Reliability of Composite Scores

When there are multiple scores available for an individual one can calculate composite scores (e.g., assigning grades in class). What is the important issue with reliability of a composite score?

🗑

Standards for Reliability in making important decisions

If a test score is used to make important decisions that will significantly impact individuals, the reliability should be very high

🗑

Standards for Reliability if part of an assessment

If a test is interpreted independently but as part of a larger assessment process (e.g., personality test), most set the standard as .80 or greater.

🗑

Standards for Reliability in research or composite score

If a test is used only in group research or is used as a part of a composite (e.g., classroom tests), lower reliability estimates may be acceptable (e.g., .70s).

🗑

Improving Reliability

Increase the number of items (i.e, better domain sampling). Use multiple measurements (i.e., composite scores). Use “Item Analysis” procedures to select the best items.Increase standardization of the test (e.g., administration and scoring).

🗑

Standard Error of Measurement (SEM)

When comparing the reliability of tests, the reliability coefficient is the statistic of choice. When interpreting individual scores, the SEM generally proves to be the most useful statistic.

🗑

The SEM is an index of what?

the average amount of error in test scores. Technically, the SEM is the standard deviation of error scores around the true score.

🗑

How is the SEM calculated?

using the reliability coefficient and the standard deviation of the test.

🗑

What is the relationship between rxx and SEM?

Since the test’s reliability coefficient is used in calculating the SEM, there is a direct relationship between rxx and SEM. As the reliability of a test increases, the SEM decreases; as reliability decreases, the SEM increases.

🗑

Confidence Intervals

reflects a range of scores that will contain the test taker’s true score with a prescribed probability. What is used to calculate confidence intervals?

🗑

A major advantage of confidence intervals

is that they remind us that measurement error is present in all scores and we should interpret scores cautiously.

🗑

Confidence intervals are interpreted

“The range within which a person’s true score is expected to fall -- % of the time.”

🗑

What is the relationship between SEM and confidence intervals?

Since the SEM is used in calculating confidence intervals, there is a direct relationship between the SEM and confidence intervals. The size of confidence intervals increases as the SEM increases. The size of confidence intervals decreases as the reliabi

🗑

A Primer on Generalizability Theory

An extension of Classical Test Theory. Classical Theory - all error is random. Generalizability Theory - recognizes sources of systematic error.

🗑

When is the classical and generalizability mathematically identical?

If you have a situation in which there is no opportunity for systematic error to enter the model, Classical and Generalizability are mathematically identical.

🗑

Classical Test Theory is most useful when

objective tests are administered under standardized conditions (e.g., SAT or GRE).

🗑

When is Generalizability Theory useful

If considerations for the Classical Test Theory are not met, consideration of the principles raised by Generalizability Theory may be useful (e.g., essay or projective tests).

🗑

Review the information in the table. When you are ready to quiz yourself you can hide individual columns or the entire table. Then you can click on the empty cells to reveal the answer. Try to recall what will be displayed before clicking the empty cell.

To hide a column, click on the column name.

To hide the entire table, click on the "Hide All" button.

You may also shuffle the rows of the table by clicking on the "Shuffle" button.

Or sort by any of the columns using the down arrow next to any column heading.
If you know all the data on any row, you can temporarily remove it by tapping the trash can to the right of the row.

Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

Created by: kxiong