Term
click below
click below
Term
Normal Size Small Size show me how
2248 6 1 way anva II
Term | Definition |
---|---|
Multiple comparisons - testing statistical significance | see two different statistics when following up with the mean comparisons |
with a t statistics and p value | typically used/seen in post hoc, pairwise (simple) comparisons |
with an f statistic and p value | typically used/seen in planned comparisons, could be simple or complex |
t statistic | calculation very similar to how run an independent t test |
f statistic | calculation involves a technique called linear contrast, which breaks down the btwn group (model) variance from the omnibus test into (even smaller) parts |
the f test is equivalent to a t test | when we only compare two sets of means (or single df comparison) f=t2 or t=pie of F, with the same p value |
Use the previous costume example | after the one way ANOVA omnibus test, we obtained a sig F, indicating theres at least one group mean differs from the t test |
f statistic: intro to linear contrast | contrast: the difference between the two sets of means we want to compare |
Linear contrast equation in an example (1) | use the previous costume example, we have the means of mickey group(x1), superman group(x2) and batman group(x3) |
suppose we want to calculate difference between M and B group: x1 - x3 | we can also assign the weights so that the difference (contrast) would be the mean of the batman group minus the mean of the mickey group |
linear contrast equation | shown earlier, we assign weights (contrast coefficients) to the means (of groups) we wish to compare |
Principles for assigning weights/contrast coefficients | some weights are positive(+) while others are negative (-) |
the comparison is between the means of groups | with positive weights and the means of groups with negative weights |
the actual value of the weight | represents the weighting assigned to the group means eg if a group has no weight (not being compared), assign 0; if the average of two groups, assign +1/2 |
the weights must total zero | and most often we lets weights total 2, but see exceptions later |
set up contrasts in line with the RQ and RH | ensuring they are interpretable (Dont set up many contrasts just because we can!) |
assign weights | contrast coefficients |
RQ: does reading and listening result in better text comprehension than reading only? | DV(numerical): reading comprehension IV(categorical): considering readers may have different reading speeds, we included three different reading and listening conditions RH: comprehension (reading + listening)>comprehension (reading only) |
assuming we have a sat sig omnibus F A priori (planned) | mean of all reading + listening groups (reading text with faster, alligned and slower audio) vs mean of reading only group |
Alternatively a priori (planned) xfaster + xaligned + xslower + xreadingonly | 1) Group mean (reading text with faster audio) vs group mean (reading only) 2) GM (reading text with aligned audio) vs GM (reading only) 3) GM (reading text with slower audio) vs GM (reading only) |
RQ: whats the most effective learning mode for students learning outcomes? assuming we have a stat sig omnibus F | DV(numerical): learning outcomes (measured by students final grades0 IV(categorical): students engaged in lectures in one of three modes (IV): 1) attend in person live class 2) listen to live stream at home and 3) listen to the lecture recording |
and my RH were | 1) listening to lectures, live may be more effective than recording 2) among the two live modes, in person may be better than the live stream (?) x in person live + (?)x online - live + (?) x recording |
linear contrats: calculate f statistics | we know how to assign weights to compare the means of groups and obtain the contrast (difference between the groups) |
Know that linear contrast | breaks down the between group(model) variance into (even smaller) parts |
lets see how we can use the linear contrast(s) | to break down the in between group variance and calculate the F value |
still use the costume example | suppose we want to compare the mean of superman and batman groups combined vs the group mean of mickey |
f statistic calculation STEPS | 1. set up contrast coefficients (weights) - decide which (sets of) means to compare 2. calculate the contrast or difference between (sets of) means 3. calculate sums of squares: SS (contrast) = 4. calculate mean squares: MS(contrast)/df(contrast) |
where df(contrast)=1 | we are comparing one mean to another mean, or a set of means to another set of means: why df(Contrast)=1 |
5. calculate F value | =MS(contrast)/MS(within), obtain MS (Within) from the omnibus F model |
TIPS | the weight must total zero (and most often, we let weigts total 2, but heres an exception) |
we can use whole numbers | as weights to get rid of decimals, and fractions eg original coefficents *2 |
the contrast (difference between the means) | now twice as big as the original contrast, but the SS remains the same ! the f value also remains the same ! |
effect size | irrespective of whether t or f statistic is calculated for statistical significance, we use cohen's d for a standardised effect size |
In independent t test | for multiple comparison of means (after a sig omnibus F test) |
use the previous example: suppose we want to compare the mean | superman and batman groups comnined vs grp mean of mickey - cohens d = 5.3 square root of (25.57) = 1.05 |
Cohens (1988) effect size rule of thumb for cohen's d | 0.2 (small effect) 0.5 (medium effect) 0.8 (large effect) |
interim summary | following up with mean comparisons, we could use (would see) both t and f statistic |
for t statistic | its calculation is very similar to an independent t test - its typically used/seen in post hoc, pairwise (simple) comparisons |
for f statistic, its calculation involves linear contrast | we assign weights (contrast coefficients) to different grps to signify the group means we want to compare |
it compares the means | by dividing between grp variance into smaller aprts |
it is often used (seen) in planned comparisons | and can be simple or complex comparison |
Testing statistical significance for multiple comparisons in stata | more or less how we obtain two statistics (t and f) manually, two common commands in stata are used for multiple comparisons |
Linear contrast in stata (f statistic) weight or contrast coefficients | W1=contrast coefficients for 1st grp mean W2 = contrast coefficient for 2nd grp mean Wx = contrast coefficent for xth group mean |
step 1 - find how the groups within the IV are coded in the dataset | because weights (Contrast coefficients) must be assigned consistent with the group order |
Step 2 - assign weights for the means (or different groups) we wish to compare ! | potential issue with running multiple comparisons |
type 1 error rate - per comparison vs family wise | when we run the follow up/secondary analyses of one way ANOVA, we run into the same issue: increasing type 1 error rate |
costume demo example again | 5 percent chances committing type I error |
suppose we compare each pair of means in three groups | per comparison error rate chances of not committing to any type I errors for all 3 comparisons |
Chances of least committing to one type I error (family wise error rate or overall error rate) | probability of a type i error as a function of the number of pairwise comparison where a=0.5 for any one comparison |
the more comparisons we run | the higher the family wise error rate will be |
where k=number of comparisons | when conducting multiple comparisons, we need to make adjustments to keep the aFW at the desirable level (ie 0.05) |
controlling type I error rate (techniques) for multiple comparisons | note that all comparisons/contrasts we calculated so far, we havent controlled for aFW (family wise error rate) |
there are many ways to keep the aFW at a desirable level (eg .05) | 1.Bonferroni adjustment (correction) 2. Tukey's honestly significant difference (HSD) 3. Scheffe 4. Sidak 5. Dunnett we will introduce three common ones (the first three) |
Bonferroni adjustment (correction) by hand | adjust the per comparison error rate or the sig level for each comparison (k=number of comparisons) |
suppose we compare each pair of means for all three groups: in our costume example | we then compare the p value of each comparison to the adjusted per comparison error rate |
after adjustments for aPER, which comparisons are significant? | Stata does it slightly differently! it displays the adjusted p value which is computed |
unadjusted p value x k | and we use the usual cut off for significance (0.05) to judge if a comparison is statistically significant or not |
PWMEAN | pwmean DV, over (IV) effects mcompare(bon) p x k |
final notes - use of bonferroni method | mathematically, using the original p to compare against adjusted aPER (aPER=0.05/k) or using the adjusted p (padjusted=punadjustedxk) to compare against the original aPER are the same |
although it can be used for both a priori (planned) and post hoc comparisons | often used in a priori (planned) comparisons (because its flexible!) |
when conducting a large number of contrasts (eg having many research hypotheses) | the bonferroni test can become conservative eg running 10 comparisons adjusted aPEAR=0.05/10 = 0.005 |
Tukey's honestly significant difference (HSD) | it compares means via a studentized range (q) statistic (not a t statistic) |
the statistic/distribution models the largest difference between the means | (meanmax-meanmin): all pairs of comparisons will be "restricted" in this range |
a higher threshold is calculated (using q compared to t) | only a difference larger than this adjusted threshold (of group difference) would be considered significant |
Layperson says: allows you to test all possible pairwise means (performing all tests at a time) | while maintaining an overall or family wise error rate at the chosen level (eg aFW < 0.05) |
most often used for post hoc comparisons | (ie comparing all possible pairwise means - not suitable for complex comparisons) |
Tukey's HSD in stata | Pwmean DV, over(IV) effects mcompare (tukey) |
mcompare(tukey) | cannot be used along with contrast command stata displays the adjusted p value |
scheffes method | like tukey's test, scheffes test also sets a family wise error rate at the chosen level |
it sets the aFW | against all possible contrasts (which can be simple or complex comparisons) |
advantage: allows lots of contrasts while strictly controlling for a FW | disadvantage: one of the most conservative multiple comparison tests, hard to reach sig |
most often used | for poc hoc comparisons (for both simple and complex comparisons) |
or genuinely have a large number of comparisons | and need a general correction |
bonferroni and scheffe alternative stata command - pairwise comparison | oneway DV IV, bonferonni scheffe |
which control method to use ? | both tukey HSD and scheffe control for the aFW |
when comparing all possible pairs of means | Tukey HSD has more power (more sensitive) and is more popular among researchers Scheffe is more sensitive for complex comparisons or when a large set of comparisons is genuinely needed |
Bonferroni adjusts the aPC | flexible, useful for a priori or planned comparisons |
however when conducting many contrasts | (eg using the 'pwmean' command in stata to compare all pairwise means, not particularly powerful |
but should NOT run all three tests to see which gets the more favorable results | p hacking |
adjustment method decision tree | types of comparisons (RH oriented) statistics + stata error rate adjustment method |
Results write up (putting two steps together) | RQ and H |
RQ and H | RQ: do children wearing superhero (s or b) or character (m) costumes sustain different injuries RH: children who wear superhero costumes may get injured more often than the children who wear character costumes |
Write up | last week: there were significant differences in the number of injruies for children, who wear different costumes |
according to a one way ANOVA analysis | F(2,27)=3.85, p=.034 with a large effect size n2 =.22 |
this week: further comparison btwn chuldren who wear superhero costumes (S and b) | and those wear mickey costumes showed a significant diffference between the two |
specifically children from the former group (mean 13.2) | had more injuries than the latter (mean 7.9) t(27) =2.71, p.012, 95% ci of contrast (1.3, 9.3) with a large effect size d=1.05, supporting our research hypothesis |