click below
click below
Normal Size Small Size show me how
psy400ch14p357-364
Tests of Association: Correlation and Regression
| Term | Definition |
|---|---|
| Pearson correlations are used when you want to determine the degree of | linear association between two interval- or ratio-level variables |
| Pearson's r assumptions | two populations are normally distributed and the relationship between the variables is linear |
| Linear regression determines the joint impact/effect of one or more | independent variables on a single dependent variable. |
| Computing Spearman’s p rather than Pearson's r may be more appropriate when your measurement scale is ordinal, | your data set violates the assumption of normality, or your two variables are not linearly related. |
| When you have a small sample size and a number of outliers in your data, | Spearman's p is generally a more conservative approach |
| Linear regression Assumptions: dependent variable is interval or ratio level; dependent and independent relationship is linear; | residuals are independent, have equal variance across all independent variable values, and are normally distributed. |
| residuals | differences between predicted and actual dependent variable values |
| Regression equation | expresses a linear relationship between a dependent variable and one or more independent variables |
| Regression coefficient: A quantity in a linear regression equation | that indicates the change in a dependent variable associated with a unit change in the independent variable. |
| y-intercept value | The point at which a regression line intersects the y-axis. |
| R-squared is identical to | the square of Pearson's r |
| Regression analysis will produce p values for the multiple correlation R and for each of the coefficients in the regression | equation, and confidence intervals may (and should) be constructed for each of these values |
| R (or R-squared) | serves as an effect size |
| Both ANOVA and regression fall under a more general statistical model known | as the general linear model |
| Regression is particularly useful in situations where you want to include variables that | cannot be experimentally manipulated (c.g., income, socioeconomic status) |
| TEST ON ORDINAL DATA | Mann-Whitney U test |
| ASSUMPTION VIOLATIONS: may affect the accuracy of confidence intervals, p values, and your estimates of quantities | such as means and standard deviations, as well as increasing the probabilities of type I or type II errors |
| Independence of observations | measurements from one participant do not depend on the measurements from other participants |
| Nonnormal Distributions | Outliers, Skew |
| Outliers | Leverage value, Cook's distance, residuals, Histograms and box-and-whisker plots |
| Leverage value and Cook's distance | measures used to detect outliers in a data set. |
| residuals | Differences between actual values and predicted values in linear regression or ANOVA. |
| QQ plot (or normal quantile-quantile plot) | A graphical technique to identify deviations from normality in a data set |
| quantile: each of any set of values of a variate which divide a frequency | distribution into equal groups, each containing the same fraction of the total population |
| Shapiro-Wilk test andKolmogorov-Smirnov test | A statistical test of whether a set of data values is normally distributed |
| statistical tests for checking assumptions | are hypothesis tests |
| low power for small sample sizes may prevent statistical tests | from identifying sizeable assumption violations |
| using the results of an assumption violation hypothesis test to determine which subsequent hypothesis test you then apply | runs the risk of increasing the probability of a type I error |
| Data transformations are sometimes useful for converting a skewed distribution into one that is more normal in shape | including logarithmic and square or cube root transformations |
| transformed data conclusions apply only to the transformed data | which may make interpretation of your results more difficult. |