# Stats

Term | Definition |
---|---|

association: | Values of one variable tend to occur with certain values of another variable; detected when the conditional distributions differ from the marginal distribution and from each other. |

bias: | A condition where the mean of the statistic values differs from the parameter that the statistic estimates. |

bivariate data: | Data collected on two variables for each individual in a study. |

Central Limit Theorem: | The name of the statement telling us that the sampling distribution of x is approximately normal whenever the sample is large and random. |

conditional distribution: | The distribution of the values in a single row (or a single column) of a two-way table. |

control chart: | A statistical tool for monitoring the input or output of a process. |

control limits: | μ − 3 σ/srn and μ + 3 σ/srn ; used to detect out-of-control signals in a control chart. |

correlation coefficient: | A measure of the strength of the linear relationship between two quantitative variables. |

disjoint events: | Events that cannot occur simultaneously. |

distribution of a variable: | A list of the possible values of a variable together with the frequency of each value. (Note: probabilities can be given instead of frequencies.) |

event: | A single outcome or a combination of outcomes from a random phenomenon. |

extrapolation: | Predicting a Y value using a value of X that is outside of the range of X values used to obtain the regression equation. This prediction could be very far off. |

inference: | Using results from a sample statistic value to draw conclusions about the population parameter. |

influential observation: | An observation that substantially alters the values of slope and y-intercept in the regression equation when it is included in the computations |

law of large numbers: | The fact that the average ( x ) of observed values in a sample will get closer and closer to μ as the sample size increases. |

laws of probability: | The basis for hypothesis testing and confidence interval estimation. |

least squares: | A method for finding the equation of a line that minimizes the sum of squared residuals. |

least squares regression line: | The line with the smallest sum of squared residuals. |

lurking variable: | A variable that is not measured but explains association between two variables that are measured. |

marginal distribution: | The distribution of the values in the “total” row (or the “total” column) of a two-way table. |

mean of the sampling distribution of x | the mean of all the sample means ( x =s) from all possible samples of size n from a population; equals μ |

μ: | The mean of the population |

no association: | A condition where values of one variable occur independent of values of another variable; detected when the conditionals of a two-way table equal the marginal distribution (and each other) |

out-of-control process: | One sample mean outside three standard deviations of x or nine sample means in a row above or below the center line. |

outlier: | An observation that falls outside the overall pattern of the data set. |

parameter: | A characteristic of a population that is usually unknown; this could be mean, median, proportion, standard deviation computed on all the data from the population.; a parameter does not have variability. |

parameter symbols: | μ, σ, and p (mean of population, standard deviation of population, proportion of a population, respectively) |

positive association: | High values of one variable tend to associate with high values of another variable. |

probability of an outcome: | A measure of the proportion of times an outcome occurs in a very long series of repetitions that gives us an indication of the likelihood of the outcome. |

process: | Sequence of operations used in production, manufacturing, etc. |

process in statistical control: | A process whose inputs and outputs exhibit natural variation when observed over time. |

quality control chart: | A chart plotting the means x of regular samples of size n against time; this chart is used to access whether the process is in control. |

quantitative bivariate: | The type of data required for regression analysis. |

r: | The symbol for correlation coefficient. |

r2: | The percentage of total variation in the response variable, Y, that is explained by the regression equation; in other words, the percentage of total variation in the response variable, Y, that is explained by the explanatory variable, X. |

random: | A phenomenon that describes the uncertainty of individual outcomes but gives a regular distribution of the outcomes in the long run. |

regression equation: | A formula for a line that models a linear relationship between two quantitative variables. |

residual: | The observed y minus the predicted y; denoted: y - yˆ |

residual plot: | A diagnostic plot of the explanatory variable versus the residuals used to access how well the regression line fits the data; |

sample mean xbar : | The random variable of the sampling distribution of xbar . |

sample space: | The list of all possible outcomes of a random phenomenon. |

sampling distribution: | A distribution of a statistic; a list of all the possible values of a statistic together with the frequency (or probability) of each value. |

sampling distribution of xbar : | A list of all the possible values for x together with the frequency (or probability) of each value; in other words, the distribution of all x ’s from all possible samples. |

sampling variability: | The variability of sample results from one sample to the next; something we must measure in order to effectively do inference. |

scatterplot: | A two dimensional plot used to examine strength of relationship between two variables as well as direction and type of relationship. |

Simpson's paradox: | A condition where the percentages reverse when a 3rd variable is ignored. a condition leading to misinterpretation of the direction of association between 2 variables caused by ignoring a 3rd variable that's associated with both of the reported variables. |

simulation: | Using random numbers to imitate chance behavior. |

slope: | A measure of the average change in the response variable for every one unit increase in the explanatory or independent variable. |

standard deviation (s): | A measure of the variability of data in a sample about xbar . |

standard deviation of xbar (also called the standard deviation of the sampling distribution of xbar ): | A measure of the variability of the values of the statistic x about μ; a measure of the variability of the sampling distribution of x ; in other words, the average amount that the statistic, x, deviates from its associated parameter. Computed as σ /SRn |

statistic: | A number computed from sample data (without any knowledge of the value of a parameter) used to estimate the value of the parameter. |

statistic symbols: | xbar , s, pˆ (mean of sample, standard deviation of sample, proportion of sample, respectively) |

statistical process control: | A procedure used to check a process at regular intervals to detect problems and correct them before they become serious. |

sum of squared residuals (or error): | the residuals are squared and added; denoted SSE. |

total variation in Y: | The sum of the squared deviations of the Y observations about their mean, y . |

two-way table: | A table containing counts for two categorical variables. It has r rows and c columns. |

unbiased: | A condition where the mean of the statistic values equals the parameter that the statistic estimates. |

unexplained variation: | The sum of squared residuals |

X: | The symbol for explanatory variable. |

xbar -chart: | A plot of sample means over time used to assess whether a process is in control. |

Y: | The symbol for response variable. |

yˆ : | The symbol for predicted y. |

z-score: | A measure of the number of standard deviations of a value or observation from the mean. |

