click below
click below
Normal Size Small Size show me how
Stats Final
| Question | Answer |
|---|---|
| When to use Pearson Correlation Coefficient | Only scale variables, question about association |
| When to use Regression | Only scale variables, question about prediction |
| When to use a z-test | Nominal independent variable; scale dependent variable, only one independent variable with only two levels, one represented by a sample and one by the population, mean and standard deviation both known. |
| When to use a single-sample t-test | Nominal independent variable; scale dependent variable, only one independent variable with only two levels, one represented by a sample and one by the population, only mean known (not standard deviation) |
| When to use a paired-samples t-test | Nominal independent variable; scale dependent variable, only one independent variable with only two levels... two samples, within-groups design |
| When to use an independent-samples t-test | Nominal independent variable; scale dependent variable, only one independent variable with only two levels... two samples, between-groups design |
| When to use a one-way within-groups ANOVA | Nominal independent variable; scale dependent variable, only one independent variable with three or more levels... within-groups design. |
| When to use a one-way between-groups ANOVA | Nominal independent variable; scale dependent variable, only one independent variable with three or more levels... between-groups design. |
| When to use a Factorial ANOVA (two-way between-groups ANOVA) | Nominal independent variable; scale dependent variable, two or more independent variables |
| When to use a chi-square test for goodness-of-fit | Only nominal variables... only one nominal variable |
| When to use a chi-square test for independence | Only nominal variables... two nominal variables |
| Why is using ANOVA’s more beneficial than conducting multiple t-tests? | Every time you conduct a t-test there is a chance that you make a Type I error ( usually 5%). Running two t-tests on the data increases chance to 10%. Three tests is around 15% and so on... ANOVA controls for these errors- Type I error remains 5% |
| Type 1 error | Rejecting the null hypothesis when the null hypothesis is true: The alpha symbol, α, is usually used to denote the chance of making a type I error |
| What is the difference between factors and levels? | A factor is a term used to describe an independent variable in a study with more than one independent variable. Each factor has two or more levels. Example: gender is a factor, male and female are levels. |
| Between-groups variance | The estimate of the population variance based on the differences among sample means |
| F statistic: a big number in the numerator indicates a great deal of distance/spread between the means and suggests _________. | They came from different populations |
| F statistic: a smaller number in the numerator indicates very little spread between the means suggesting _______. | That the samples came from the same population |
| Within-groups variance | The estimate of the population variance based on the differences within each of the three or more sample distributions-refers to the average of the number of variances |
| Large F-statistic | Indicates that the between-groups variance(numerator) is much larger than the within-groups variance(denominator) and this means that the sample means are different from one another- reject the null |
| Small F-statistic (or close to 1) | Indicates that the between-groups variance(numerator) is smaller than the within-groups variance(denominator) or they are about the same and this means that the sample means are the same- fail to reject null |
| Grand mean | The mean of every score in a study, regardless of which sample the score came from |
| F is based on ____ and T on the standard deviation | Variance |
| Three assumptions in an ANOVA | 1. Random selection 2. Normally distributed population. This assumption becomes less important as the sample size increases. 3. Homoscedasticity ( homogeneity of variance)- the samples all come from populations with the same variances |
| The formula for the between-groups degrees of freedom | dfbetween = Ngroups − 1 |
| The formula for the within-groups degrees of freedom (one-way within groups ANOVA) | dfwithin = df1 + df2 + df3 + df4.... |
| Within-groups SS is calculated by... | Subtracting each score (x) from the mean of that group of scores, then squaring each of these scores and adding them all together. |
| Between-groups SS is calculated by... | Taking the groups mean and subtracting the grand mean, squaring all these values and adding them all together. |
| 6 Steps of Hypothesis Testing | 1. Identify the populations, distribution, and assumptions 2. State hypotheses 3. Determine characteristics of the comparison distribution 4. Determine the critical value 5. Calculate test statistic 6. Make a decision |
| MS | Mean sum of squares instead of s squared... another word for variance: it is the SS divided by the degrees of freedom for both the between groups and the within groups. Both MS between and MS within values will be used to find the F statistic |
| Post-hoc test | A statistical procedure frequently carried out after the null hypothesis has been rejected in an analysis of variance; it allows us to make multiple comparisons among several means; often referred to as a follow-up test |
| Effect Size for ANOVA | R²- the proportion of variance in the dependent variable that is accounted for by the independent variable... SSbetween/SStotal. Tells us how big of a difference is, but we don't know which pairs of means are responsible for these differences |
| Tukey HSD test | A widely used post hoc test that determines the differences between means in terms of standard error; the HSD is compared to a critical value; sometimes called the q test. |
| N' | "harmonic mean". -when sample sizes are different, you have to calculate a weighted sample size; must be calculated before calculating the standard error. Ngroups/the sum of 1/N |
| HSD= | (M1-M2)/sM |
| sM= | The square root of MSwithin/N' |
| Matched-group design | Similar to a within-groups design; it has different people in each level of the independent variable, but they are similar (matched) on characteristics important to the study |
| What are the advantages for using a within-groups ANOVA? | Reduces errors that are due to differences between the groups, because each group includes exactly the same participants. More statistical power, fewer subjects. |
| Concerns presented in using a within-groups ANOVA | Order effects- caused by exposing the subjects to multiple treatments- related to the order that treatments are given. Ex: a dry wine may get a higher rank if it was preceded by a dryer wine and a lower rank if preceded by a sweet wine |
| Interaction | A statistical interaction occurs in a factorial design when two or more independent variables have an effect on the dependent variable in combination that neither independent variable has on its own |
| Main effect | One IV in a two-way ANOVA influences the DV on its own; we evaluate this by disregarding the influence of any other IV- calculate the F statistics for all of the IVs to see if their values extend beyond the crit. value, if they do- significant influence |
| Advantages of conducting a two-way ANOVA | Not only can we assess the independent impact of our two factors, but also we can assess the interaction of the two factors in their effect on the DV. Thus, with the same data we would be able to test three different null hypotheses |
| Cell | A box that depicts one unique combination of levels of the independent variables in a factorial design |
| Why are main effects often ignored when there is an interaction? | An interaction can distort, conceal, or exaggerate the main effects of the IVs. They're ignored because if the main effects ARE statistically significant, they are further qualified by the interaction effect, which is described in the summary |
| You would not need to run a post-hoc if you have under _____ levels | Three |
| Quantitative Interaction | An interaction in which the effect of one independent variable is strengthened or weakened at one or more levels of the other independent variable, but the direction of the initial effect does not change. |
| Qualitative Interaction | A particular type of quantitative interaction of two (or more) independent variables in which one independent variable reverses its effect depending on the level of the other independent variable. |
| Marginal Mean | The average scores from a group in an experiment. Ex: The contingency table shows two factors: sex (m/f) and pet ownership (y/n). Each of these factors has two “levels", therefore you’ll need to find four marginal means: Male, female, pets, no pets. |
| How many critical values will you need to look up in order to analyze where your F stats fall into play for a two-way between groups ANOVA? | You will need three critical values, based of F(between df, within df) for the two main effects and the interactions -you could have all three values go beyond the critical value, or you could only have the interaction effect be significant |
| What are the four sources of variability in a two-way ANOVA? | There are three between-groups mean squares—one for each main effect and one for the interaction—and one within-groups mean square. Example: Age (between rows) Repetitions (between columns) Age × repetitions (between/interaction) Within |
| Mixed-design ANOVA | Used to analyze the data from a study with at least two IVs; at least one of the IVs must be within-groups and at least one variable must be between-groups -includes both within-groups variables and between-groups variables |
| R² effect sizes ANOVA | Small= .01 Medium= .09 Large= .25 |
| Effect size for two-way ANOVA | r²rows= SSrows/(SStotal-SScolumns-SSinteraction). Do the same for r²columns and r²interaction. Three effect sizes |
| Correlation coefficient | A statistic that quantifies a relation between two variables |
| Positive Correlation | An association between two variables such that participants with high scores on one variable tend to have high scores on the other variable, and those with low scores on one variable tend to have low scores on the other variable; slopes up and right |
| Negative Correlation | An association between two variables in which participants with high scores on one variable tend to have low scores on the other variable; slopes down and right |
| Size of correlation coefficient | How large the correlation is |
| Sign of correlation coefficient | Direction of correlation |
| Correlation effect sizes | Small=.10 Medium=.30 Large=.50 |
| Scatterplots | It is important to construct a scatterplot for correlations because you can see if there is a general trend with your data and you can also see if there are any obvious outliers that may be skewing the data |
| What would a correlation coefficient of +/-1.00 indicate about the data? | That our data falls in a perfectly straight line moving in the positive direction (+1.00) or negative direction (-1.00) -this is highly unlikely to happen in real life |
| What does a correlation coefficient of 0.00 indicate about the data? | That our data has no correlation and that there is no association between our two variables |
| Pearson Correlation coefficient assumptions | 1. Random selection is used 2. Underlying population distribution for the two variables must be approximately normal 3. Each variable should vary equally, no matter the magnitude of the other variable |
| Pearson correlation coefficient degrees of freedom | df= N-2 |
| Correlation coefficient (r)= | ∑[(X-Mx((Y-My)]/the square root of (SSx)(SSy) |
| (SSx)(SSy) in correlation coefficient | Removes the influence of sample size in order to calculate the variance |
| Psychometrics | The branch of statistics used in the development of tests and measures |
| Reliability | Consistency |
| Test-retest reliability | Refers to whether the scale being used provides consistent information every time the test is taken -to calculate the measure's test-retest reliability, the measure is given twice to the same sample, typically with delay in between tests |
| Internal reliability | Do all questions on the test measure the same thing? Examining correlations with each individual item and the overall score |
| Validity | Measures what is intended to be measured |
| Simple linear regression | A statistical tool that lets us predict an individual's score on the dependent variable based on the score of the independent variable. One independent variable. -also allows us to calculate the equation for a straight line that describes the data |
| Intercept | Predicted value of y when x=0. The point at which the line crosses, or intercepts, the y-axis |
| Slope | Amount that y is predicted to increase for an increase of 1 in x. |
| Regression to the mean | The tendency of scores that are particularly high or low to drift toward the mean over time |
| Standard regression equation | Zy=(Rxy)(Zx) rxy=correlation TWO ZYS. Both get pluffed into ŷ formula |
| Zx= | (X-Mx)/SDx (do x=0 once and x=1 once) |
| (X-Mx)/SDx when x=0 | Ex: plugged into rest of equation: GPA if we had no poverty at all |
| (X-Mx)/SDx when x=1 | Ex: plugged into rest of equation: GPA when x=1 |
| Linear regression equation | ŷ=zŷ(SDy)+My |
| Standardized regression coefficient | Beta; Standardized version of slope- predicted change in y for an increase of 1x. |
| Standard error of the estimate | A statistic indicating the typical distance between a regression line and the actual data points; it is the amount of error around the line of best fit and it can be quantified -the standard deviation of the actual data points around the regression line |
| Limitations of regression | Cannot establish causation, can be used only when there's a linear relationship, issues of generalization, regression to the mean |
| Proportionate reduction in error | Also called the coefficient of determination; it tells us how good the regression equation is in comparison to the using the mean as the predictor. |
| Easiest way to compute a regression line | b=rxy(SDy/SDx) a=My-b(Mx) *this is in equation packet |
| Regression: a | The y value when x=0 |
| Regression: b | The slope |
| Multiple Regression | A statistical technique that includes two or more predictor variables in a prediction equation that are independent of each other. Covariates. To or more independent variables |
| Structural equation Modeling (SEM) | A statistical technique that quantifies how well sample data "fit" a theoretical model that hypothesizes a set of relations among multiple variables -encourages researchers to think of variables as a series of connections |
| Manifest variable | The variables in a study that we can observe and that are measured: ex: vocabulary scores, math scores |
| Latent variable | The ideas that we want to research but cannot directly measure. ex: intelligence |
| Non-parametric tests | Don't require assumptions. use when sample size is small or underlying population is not normal. |
| Limitations of non-parametric tests | Cannot easily use confidence intervals or effect sizes, have less statistical power, more likely to commit the type 2 error, nominal and ordinal data provide less information |
| Chi-square goodness of fit | No IV or DV, just one categorical variable with two or more categories into which participants are placed -measures how good the fit is between the observed data in the various categories and the data we would expect according to the null hypothesis |
| Orthogonal variable | An independent variable that makes a separate and distinct contribution in the prediction of a dependent variable as compared with another variable |
| Regression: smaller standard error of estimate when... | The points on a graph are closely wrapped around the regression line (line of best fit) |
| Cramer's V | The standard effect size used with the chi-square test for independence; also called Cramér’s phi, symbolized as ϕ. |
| Assume a positive correlation is found between the number of hours students spend studying for an exam and their grade on the exam. If the regression equation for these data is calculated and the y intercept is 65, what conclusion can be drawn? | When students do not study at all, we would predict a score of 65 on the exam. |
| For the three statistics F, z, and t, divide _____ variability by _____ variability to analyze the relation between variables. | Between-groups; within-groups |
| Does the F distribution take into account individual differences when comparing sample means? | Yes; as the variability within the individual samples decreases, the F statistic becomes larger since the distributions are not overlapping very much. |
| The subjects sum of squares calculated in the one-way within-groups ANOVA assesses: | The variability due to participant differences. |
| Correlation is linear | It's an assumption! Pearson correlation coefficient quantifies a linear relation between two scale variables- a number is used to describe the direction and strength between 2 variables when their overall pattern indicates a straight-line. |
| As with the Pearson correlation coefficient, we are not able to use simple linear regression if the data .... | Do not form the pattern of a straight line |
| With simple linear regression, the standardized regression coefficient is identical to | The Pearson correlation coefficient. |
| Type 2 Error | Failing to reject the null hypothesis when the null hypothesis is false |
| The best way to allow yourself to set a low alpha level and have a good chance of rejecting the null when it is false is to... | Increase sample size |
| To lower risk of making a type 1 error, you must use a lower value for α. However, using a lower value for alpha means _______ | That you will be less likely to detect a true difference if one really exists. |
| In-line outliers | Increases correlation |
| Out of line outliers | Reduces correlation |
| Why correlation does not equal causation | There are three possible explanations of a correlation. A --> B B --> A C --> A and B |
| In Chi-Square goodness of fit test, the null hypothesis assumes that | There is no significant difference between the observed and the expected value. |
| Null hypothesis for Chi-square independence test | Assumes that there is no association between the two variables. |
| Goodness of fit: if we hope to receive empirical support for the research hypothesis (reject the null), then we’re hoping for | A bad fit between the observed data and what we expect |
| Chi square assumption: There is a minimum number of expected participants in every category (cell) | At least 5 and preferably more. An alternative guideline is for there to be at least five times as many participants as cells... The chi-square tests seem robust to violations of this assumption however |
| The table of cells for a chi-square test for independence is called a _______ | Contingency table |
| In a chi-square test where there are more than two levels of one of the variables, the _____ allow(s) you to make more specific conclusions rather than a general interpretation based on rejecting the null hypothesis. | Adjusted standardized residuals |
| Structural equation modeling graphs depict a _____ among several variables, demonstrating how all of the variables combine to create a _____. | network of relations; statistical model |
| When drawing a line of best fit, it is “best” to use _____ point(s) of _____ value(s). | at least 2; low and high |
| Partial correlation | Controlling 3rd variable while looking at the relationship between 2 variables: help us understand how 2 variables are related, independent of a 3rd. Ex: see the correlation between # of absences and exam grade, over the correlation of homework |