Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

Week 4. PSY 3041

Item Analysis and the Meaning of Test Scores

QuestionAnswer
Review: What are the 5 stages of the test development process? 1. Test Conceptualisation 2. Test Construction 3. Test Tryout 4. Item Analysis 5. Test Revision
Review: Two motives for test conceptualisation? 1. No test is available to measure the given construct 2. The tests available may not have the psychometric properties you want e.g.test may not be valid, have poor internal consistency, not reliable.
Review: Main tasks in test construction phase? These tasks are linked: 1. Writing the items and choosing item format (selected response formats and constructed response formats) 2. Choosing an appropriate scale associated with the items.
Review: Test Tryout? This is where you have a bank of items you have created and you test how good they actually are.
What makes a good item? Similar to what makes a good test: 1. Reliability 2. Valid Also, a good item importantly: 3. Discriminates between test takers. Generally, a good test item will be answered correctly by people who do well on the test as a whole (vice versa).
What is item analysis? It is the analysis of individual items as well as the overall test scores. There are both quantitative and qualitative approaches, with the former more common. Item analysis leads to items being kept, revised or discarded.
Quantitative Item Analysis? This involves looking at means, standard deviations, distributions of scores we obtain etc.
Qualitative Item Analysis? Really useful when we want to understand why some items aren't performing as expected
What are the tools for item analysis used by test developers? There are four main types, first two are most fundamental: 1.Indices of item difficulty 2.Indicies of item discrimination 3.Indices of item reliability 4.Indices of item validity
Does it matter which of the four item analysis tools you use? Any of the four tools of item analysis tend to yield similar decisions; there will be some differences, but good items have a number of difference dimensions, so you tend to arrive at similar points in judgment irrespective of which tool used.
What is the item difficulty index? The idea of item difficult applies to each item. It is calculated as the proportion of test-takers who answered the item correctly (p) p can range from 0 (no one answered correctly) to 1 (everyone answered correctly)
What is the overall item difficulty index? Found by averaging individual p (difficulty index for each item in test) values, yielding overall item difficulty of test. Ideally average item difficulty is 0.5, but generally individual item difficulty ranges from 0.3 (quite hard) to 0.8 (quite easy)
How does the item difficulty index help in item analysis process? Items with p values close/equal to 1 have been answered correctly by most/all. Items with p values close/equal to 0 are not answered correctly by most/all. These items don't provide discrimination between test takers and should be discarded or revised.
How does guessing (in selected response format questions) impact item difficulty? Guessing must be accounted for in selected response formats Average optimal item difficulty, is mid-way between 1 & probability of guessing correctly e.g.In true-false: prob of guessing correctly=0.5, therefore optimal average item difficulty is p=0.75
Item reliability? When we talk about reliability, we are always referring to consistency, whether this is across time, among items or between judges etc.
What is the item reliability index? Item reliability index provides provides indication of internal consistency of a test. The higher the index, the greater the test's internal consistency.
How do we approach item reliability? 1.Calculation of individual item reliability 2.Calculation of test reliability/scale reliability. Scale, as well as refering to measurement & response scales, it also refers to certain clusters of items within a test that are measuring the same thing.
Calculation of Reliability of Individual Items Reliability index of individual items calculation: Item score standard deviation (s1) x The correlation between item score and total test score (r1t)
r - correlation between item and overall test High correlations (r) suggests good item: item is measuring same thing as test. Low correlations (r)suggests a poor item: item is measuring something slightly/completely different from the test as a whole
s - item score standard deviation Item score standard deviation of item 1 would be denoted by s1 s1 = sqrt(p1(1-p1)) where p1 is the item difficulty index for item 1.
Calculation of reliability of scales - in this case, scales refers to clusters of items within a test Reliability of clusters of items i.e. internal/inter-item consistency: demonstrates homogeneity of test/scale If response scale is: 1.Dichotomous e.g.true-false: use Kuder-Richardson Formula 20 2.Non-dichotomous e.g.Likert: use Crohnbach's Alpha
Item validity index Degree which a test measures what it purports to measure. Criterion validity is one way to measure validity. Calc: Item validity index = Item score std dev (s1) x correlation between (r1c)item score & score on criterion measure High r = better item
Item discrimination Good items discriminates between people. If an item is good, most high scorers on the overall test, will answer the item correctly. Similarly, most low scorers (poor understanding of subject matter) on the overall test will answer the item incorrectly
What is the Item Discrimination Index? The degree to which an item differentiates correctly on the behaviour the test is designed to measure.
How do we approach the Item Discrimination? Two approaches: 1.Item total correlations: closely related to item reliability. It is the correlation between each item and total test score. Higher correlation=greater internal consistency of test 2.Index of discrimination
What is the Item Discrimination Index? Difference in pass rate on a given test, between a high ability group and a low ability group. Those who perform well on the test as a whole should have higher pass rates for any given item than those who perform poorly overall
Specifics of Item Discrimination Index Symbolised by "d" d = index of discrimination for each item Compares performance on a particular item by the high ability group i.e. top 27% of test takers and the low ability group i.e. bottom 27% of test takers *27% only if you have normal distribu
What do the d values (item discrimination index values) tell us? d ranges from -1 to +1 A positive d value = item discriminates well 0 or a negative d value = red flag! Indicates that low scoring test takers are more likely to answer a given item correctly than high scoring test takers - item does NOT discriminate we
How is Item Discrimination Index, d, calculated? d = (number of high scorers answering item correctly (U) - proportion of low scorers answering item correctly (L)) divided by number of scores in each group (n) *Check with text if Sam has it right or not
What is one scenario when a tool/test is considered useless? When there is no discrimination and therefore you can't make any decisions, the test is basically useless. A test isn't just for better understanding but also used to make decisions/judgments about people.
Item characteristic curves (ICC) These are curves created for each item and can be done in addition to calculating indices of item discrimination.
How does an item characteristic curve discriminate? The shape of the curve reveals the extent an item discriminates between high ability and low ability groups: Steeper curve = greater discrimination Also used to judge fairness of an item: More difficult and item = curve shifts more to the right
What are some other considerations in item analysis? 1.Guessing: can't control for entirely but partly in setting difficulty (setting item p values). This interferes with item analysis esp if ppl guess correctly 2.Item Fairness: undermined when items favour particular group of test takers 3.Speed Tests
How does guessing effect item analysis? Guessing interferes with our ability to make inferences regarding the items. e.g. if they guess correctly, it doesn't relate to their ability & hence interfere with our judgment of the item.
How does speeded tests effect item analysis? Speeded items also interferes with out ability to draw inferences and judge items. e.g. if items were failed due to lack of time, this is interpreted as an incorrect response thereby interfering with how we understand an item
How do we try to remedy the effect of guessing and speed on our item analysis? For item analysis, ppl given the test items in the testing phase should not guess and the test shouldn't be under speed conditions. However in real administration, once test is finalised, it can be given under speed conditions & ppl can guess.
Why can we give finalised tests under speeded conditions and not be fussed about people guessing? Because all items have already been analysed. Speed and guessing will not longer interfere with our judgment, understanding and inferences about an item, unlike during the test analysis phase of development and construction.
We have looked at quantitative analyses, now what about qualitative analyses? Qualitative analyses completement quantitative analyses. Very useful for understanding why items aren't performing as expected
What are two popular approaches to qualitative analysis? 1.Getting test taker to "think aloud", done during test administration to give us an insight into thinking 2.Expert Panels: ppl from different communities/backgrounds to ascertain fairness of test
Explain the "think aloud" method of qualitative analysis. Rests on assumption that in asking ppl to articulate their thoughts as they are answering, you can gain insight into how they arrive to an answer. Not all thought processes can be verbalised, make sure your process of interest can be before using this.
What are the pros and cons of the "think aloud" method of qualitative analysis? Some mental processes involved in working out an answer can be verbalised, but not all mental processes used to arrive to an answer are available for declaration. e.g. we don't have access to intuitive judgments, which are automatic, fast and unconscious.
Explain the "expert panels" method of qualitative analysis Getting representatives from various backgrounds/communities to serve on a panel to ascertain whether items are functioning fairly and aren't e.g. unclear, presuppose familiarity with certain cultural norms, offensive etc - all which can effect answering.
Created by: jecca168
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards