Save
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password

Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.

Term

Multiple comparisons - testing statistical significance
click to flip
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't know

Term

with a t statistics and p value
Remaining cards (85)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

2248 6 1 way anva II

TermDefinition
Multiple comparisons - testing statistical significance see two different statistics when following up with the mean comparisons
with a t statistics and p value typically used/seen in post hoc, pairwise (simple) comparisons
with an f statistic and p value typically used/seen in planned comparisons, could be simple or complex
t statistic calculation very similar to how run an independent t test
f statistic calculation involves a technique called linear contrast, which breaks down the btwn group (model) variance from the omnibus test into (even smaller) parts
the f test is equivalent to a t test when we only compare two sets of means (or single df comparison) f=t2 or t=pie of F, with the same p value
Use the previous costume example after the one way ANOVA omnibus test, we obtained a sig F, indicating theres at least one group mean differs from the t test
f statistic: intro to linear contrast contrast: the difference between the two sets of means we want to compare
Linear contrast equation in an example (1) use the previous costume example, we have the means of mickey group(x1), superman group(x2) and batman group(x3)
suppose we want to calculate difference between M and B group: x1 - x3 we can also assign the weights so that the difference (contrast) would be the mean of the batman group minus the mean of the mickey group
linear contrast equation shown earlier, we assign weights (contrast coefficients) to the means (of groups) we wish to compare
Principles for assigning weights/contrast coefficients some weights are positive(+) while others are negative (-)
the comparison is between the means of groups with positive weights and the means of groups with negative weights
the actual value of the weight represents the weighting assigned to the group means eg if a group has no weight (not being compared), assign 0; if the average of two groups, assign +1/2
the weights must total zero and most often we lets weights total 2, but see exceptions later
set up contrasts in line with the RQ and RH ensuring they are interpretable (Dont set up many contrasts just because we can!)
assign weights contrast coefficients
RQ: does reading and listening result in better text comprehension than reading only? DV(numerical): reading comprehension IV(categorical): considering readers may have different reading speeds, we included three different reading and listening conditions RH: comprehension (reading + listening)>comprehension (reading only)
assuming we have a sat sig omnibus F A priori (planned) mean of all reading + listening groups (reading text with faster, alligned and slower audio) vs mean of reading only group
Alternatively a priori (planned) xfaster + xaligned + xslower + xreadingonly 1) Group mean (reading text with faster audio) vs group mean (reading only) 2) GM (reading text with aligned audio) vs GM (reading only) 3) GM (reading text with slower audio) vs GM (reading only)
RQ: whats the most effective learning mode for students learning outcomes? assuming we have a stat sig omnibus F DV(numerical): learning outcomes (measured by students final grades0 IV(categorical): students engaged in lectures in one of three modes (IV): 1) attend in person live class 2) listen to live stream at home and 3) listen to the lecture recording
and my RH were 1) listening to lectures, live may be more effective than recording 2) among the two live modes, in person may be better than the live stream (?) x in person live + (?)x online - live + (?) x recording
linear contrats: calculate f statistics we know how to assign weights to compare the means of groups and obtain the contrast (difference between the groups)
Know that linear contrast breaks down the between group(model) variance into (even smaller) parts
lets see how we can use the linear contrast(s) to break down the in between group variance and calculate the F value
still use the costume example suppose we want to compare the mean of superman and batman groups combined vs the group mean of mickey
f statistic calculation STEPS 1. set up contrast coefficients (weights) - decide which (sets of) means to compare 2. calculate the contrast or difference between (sets of) means 3. calculate sums of squares: SS (contrast) = 4. calculate mean squares: MS(contrast)/df(contrast)
where df(contrast)=1 we are comparing one mean to another mean, or a set of means to another set of means: why df(Contrast)=1
5. calculate F value =MS(contrast)/MS(within), obtain MS (Within) from the omnibus F model
TIPS the weight must total zero (and most often, we let weigts total 2, but heres an exception)
we can use whole numbers as weights to get rid of decimals, and fractions eg original coefficents *2
the contrast (difference between the means) now twice as big as the original contrast, but the SS remains the same ! the f value also remains the same !
effect size irrespective of whether t or f statistic is calculated for statistical significance, we use cohen's d for a standardised effect size
In independent t test for multiple comparison of means (after a sig omnibus F test)
use the previous example: suppose we want to compare the mean superman and batman groups comnined vs grp mean of mickey - cohens d = 5.3 square root of (25.57) = 1.05
Cohens (1988) effect size rule of thumb for cohen's d 0.2 (small effect) 0.5 (medium effect) 0.8 (large effect)
interim summary following up with mean comparisons, we could use (would see) both t and f statistic
for t statistic its calculation is very similar to an independent t test - its typically used/seen in post hoc, pairwise (simple) comparisons
for f statistic, its calculation involves linear contrast we assign weights (contrast coefficients) to different grps to signify the group means we want to compare
it compares the means by dividing between grp variance into smaller aprts
it is often used (seen) in planned comparisons and can be simple or complex comparison
Testing statistical significance for multiple comparisons in stata more or less how we obtain two statistics (t and f) manually, two common commands in stata are used for multiple comparisons
Linear contrast in stata (f statistic) weight or contrast coefficients W1=contrast coefficients for 1st grp mean W2 = contrast coefficient for 2nd grp mean Wx = contrast coefficent for xth group mean
step 1 - find how the groups within the IV are coded in the dataset because weights (Contrast coefficients) must be assigned consistent with the group order
Step 2 - assign weights for the means (or different groups) we wish to compare ! potential issue with running multiple comparisons
type 1 error rate - per comparison vs family wise when we run the follow up/secondary analyses of one way ANOVA, we run into the same issue: increasing type 1 error rate
costume demo example again 5 percent chances committing type I error
suppose we compare each pair of means in three groups per comparison error rate chances of not committing to any type I errors for all 3 comparisons
Chances of least committing to one type I error (family wise error rate or overall error rate) probability of a type i error as a function of the number of pairwise comparison where a=0.5 for any one comparison
the more comparisons we run the higher the family wise error rate will be
where k=number of comparisons when conducting multiple comparisons, we need to make adjustments to keep the aFW at the desirable level (ie 0.05)
controlling type I error rate (techniques) for multiple comparisons note that all comparisons/contrasts we calculated so far, we havent controlled for aFW (family wise error rate)
there are many ways to keep the aFW at a desirable level (eg .05) 1.Bonferroni adjustment (correction) 2. Tukey's honestly significant difference (HSD) 3. Scheffe 4. Sidak 5. Dunnett we will introduce three common ones (the first three)
Bonferroni adjustment (correction) by hand adjust the per comparison error rate or the sig level for each comparison (k=number of comparisons)
suppose we compare each pair of means for all three groups: in our costume example we then compare the p value of each comparison to the adjusted per comparison error rate
after adjustments for aPER, which comparisons are significant? Stata does it slightly differently! it displays the adjusted p value which is computed
unadjusted p value x k and we use the usual cut off for significance (0.05) to judge if a comparison is statistically significant or not
PWMEAN pwmean DV, over (IV) effects mcompare(bon) p x k
final notes - use of bonferroni method mathematically, using the original p to compare against adjusted aPER (aPER=0.05/k) or using the adjusted p (padjusted=punadjustedxk) to compare against the original aPER are the same
although it can be used for both a priori (planned) and post hoc comparisons often used in a priori (planned) comparisons (because its flexible!)
when conducting a large number of contrasts (eg having many research hypotheses) the bonferroni test can become conservative eg running 10 comparisons adjusted aPEAR=0.05/10 = 0.005
Tukey's honestly significant difference (HSD) it compares means via a studentized range (q) statistic (not a t statistic)
the statistic/distribution models the largest difference between the means (meanmax-meanmin): all pairs of comparisons will be "restricted" in this range
a higher threshold is calculated (using q compared to t) only a difference larger than this adjusted threshold (of group difference) would be considered significant
Layperson says: allows you to test all possible pairwise means (performing all tests at a time) while maintaining an overall or family wise error rate at the chosen level (eg aFW < 0.05)
most often used for post hoc comparisons (ie comparing all possible pairwise means - not suitable for complex comparisons)
Tukey's HSD in stata Pwmean DV, over(IV) effects mcompare (tukey)
mcompare(tukey) cannot be used along with contrast command stata displays the adjusted p value
scheffes method like tukey's test, scheffes test also sets a family wise error rate at the chosen level
it sets the aFW against all possible contrasts (which can be simple or complex comparisons)
advantage: allows lots of contrasts while strictly controlling for a FW disadvantage: one of the most conservative multiple comparison tests, hard to reach sig
most often used for poc hoc comparisons (for both simple and complex comparisons)
or genuinely have a large number of comparisons and need a general correction
bonferroni and scheffe alternative stata command - pairwise comparison oneway DV IV, bonferonni scheffe
which control method to use ? both tukey HSD and scheffe control for the aFW
when comparing all possible pairs of means Tukey HSD has more power (more sensitive) and is more popular among researchers Scheffe is more sensitive for complex comparisons or when a large set of comparisons is genuinely needed
Bonferroni adjusts the aPC flexible, useful for a priori or planned comparisons
however when conducting many contrasts (eg using the 'pwmean' command in stata to compare all pairwise means, not particularly powerful
but should NOT run all three tests to see which gets the more favorable results p hacking
adjustment method decision tree types of comparisons (RH oriented) statistics + stata error rate adjustment method
Results write up (putting two steps together) RQ and H
RQ and H RQ: do children wearing superhero (s or b) or character (m) costumes sustain different injuries RH: children who wear superhero costumes may get injured more often than the children who wear character costumes
Write up last week: there were significant differences in the number of injruies for children, who wear different costumes
according to a one way ANOVA analysis F(2,27)=3.85, p=.034 with a large effect size n2 =.22
this week: further comparison btwn chuldren who wear superhero costumes (S and b) and those wear mickey costumes showed a significant diffference between the two
specifically children from the former group (mean 13.2) had more injuries than the latter (mean 7.9) t(27) =2.71, p.012, 95% ci of contrast (1.3, 9.3) with a large effect size d=1.05, supporting our research hypothesis
Created by: brendonpizarro1
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards