Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

Normal Size Small Size show me how

# Stats

Term | Definition |
---|---|

association: | Values of one variable tend to occur with certain values of another variable; detected when the conditional distributions differ from the marginal distribution and from each other. |

bias: | A condition where the mean of the statistic values differs from the parameter that the statistic estimates. |

bivariate data: | Data collected on two variables for each individual in a study. |

Central Limit Theorem: | The name of the statement telling us that the sampling distribution of x is approximately normal whenever the sample is large and random. |

conditional distribution: | The distribution of the values in a single row (or a single column) of a two-way table. |

control chart: | A statistical tool for monitoring the input or output of a process. |

control limits: | μ − 3 σ/srn and μ + 3 σ/srn ; used to detect out-of-control signals in a control chart. |

correlation coefficient: | A measure of the strength of the linear relationship between two quantitative variables. |

disjoint events: | Events that cannot occur simultaneously. |

distribution of a variable: | A list of the possible values of a variable together with the frequency of each value. (Note: probabilities can be given instead of frequencies.) |

event: | A single outcome or a combination of outcomes from a random phenomenon. |

extrapolation: | Predicting a Y value using a value of X that is outside of the range of X values used to obtain the regression equation. This prediction could be very far off. |

inference: | Using results from a sample statistic value to draw conclusions about the population parameter. |

influential observation: | An observation that substantially alters the values of slope and y-intercept in the regression equation when it is included in the computations |

law of large numbers: | The fact that the average ( x ) of observed values in a sample will get closer and closer to μ as the sample size increases. |

laws of probability: | The basis for hypothesis testing and confidence interval estimation. |

least squares: | A method for finding the equation of a line that minimizes the sum of squared residuals. |

least squares regression line: | The line with the smallest sum of squared residuals. |

lurking variable: | A variable that is not measured but explains association between two variables that are measured. |

marginal distribution: | The distribution of the values in the “total” row (or the “total” column) of a two-way table. |

mean of the sampling distribution of x | the mean of all the sample means ( x =s) from all possible samples of size n from a population; equals μ |

μ: | The mean of the population |

no association: | A condition where values of one variable occur independent of values of another variable; detected when the conditionals of a two-way table equal the marginal distribution (and each other) |

out-of-control process: | One sample mean outside three standard deviations of x or nine sample means in a row above or below the center line. |

outlier: | An observation that falls outside the overall pattern of the data set. |

parameter: | A characteristic of a population that is usually unknown; this could be mean, median, proportion, standard deviation computed on all the data from the population.; a parameter does not have variability. |

parameter symbols: | μ, σ, and p (mean of population, standard deviation of population, proportion of a population, respectively) |

positive association: | High values of one variable tend to associate with high values of another variable. |

probability of an outcome: | A measure of the proportion of times an outcome occurs in a very long series of repetitions that gives us an indication of the likelihood of the outcome. |

process: | Sequence of operations used in production, manufacturing, etc. |

process in statistical control: | A process whose inputs and outputs exhibit natural variation when observed over time. |

quality control chart: | A chart plotting the means x of regular samples of size n against time; this chart is used to access whether the process is in control. |

quantitative bivariate: | The type of data required for regression analysis. |

r: | The symbol for correlation coefficient. |

r2: | The percentage of total variation in the response variable, Y, that is explained by the regression equation; in other words, the percentage of total variation in the response variable, Y, that is explained by the explanatory variable, X. |

random: | A phenomenon that describes the uncertainty of individual outcomes but gives a regular distribution of the outcomes in the long run. |

regression equation: | A formula for a line that models a linear relationship between two quantitative variables. |

residual: | The observed y minus the predicted y; denoted: y - yˆ |

residual plot: | A diagnostic plot of the explanatory variable versus the residuals used to access how well the regression line fits the data; |

sample mean xbar : | The random variable of the sampling distribution of xbar . |

sample space: | The list of all possible outcomes of a random phenomenon. |

sampling distribution: | A distribution of a statistic; a list of all the possible values of a statistic together with the frequency (or probability) of each value. |

sampling distribution of xbar : | A list of all the possible values for x together with the frequency (or probability) of each value; in other words, the distribution of all x ’s from all possible samples. |

sampling variability: | The variability of sample results from one sample to the next; something we must measure in order to effectively do inference. |

scatterplot: | A two dimensional plot used to examine strength of relationship between two variables as well as direction and type of relationship. |

Simpson's paradox: | A condition where the percentages reverse when a 3rd variable is ignored. a condition leading to misinterpretation of the direction of association between 2 variables caused by ignoring a 3rd variable that's associated with both of the reported variables. |

simulation: | Using random numbers to imitate chance behavior. |

slope: | A measure of the average change in the response variable for every one unit increase in the explanatory or independent variable. |

standard deviation (s): | A measure of the variability of data in a sample about xbar . |

standard deviation of xbar (also called the standard deviation of the sampling distribution of xbar ): | A measure of the variability of the values of the statistic x about μ; a measure of the variability of the sampling distribution of x ; in other words, the average amount that the statistic, x, deviates from its associated parameter. Computed as σ /SRn |

statistic: | A number computed from sample data (without any knowledge of the value of a parameter) used to estimate the value of the parameter. |

statistic symbols: | xbar , s, pˆ (mean of sample, standard deviation of sample, proportion of sample, respectively) |

statistical process control: | A procedure used to check a process at regular intervals to detect problems and correct them before they become serious. |

sum of squared residuals (or error): | the residuals are squared and added; denoted SSE. |

total variation in Y: | The sum of the squared deviations of the Y observations about their mean, y . |

two-way table: | A table containing counts for two categorical variables. It has r rows and c columns. |

unbiased: | A condition where the mean of the statistic values equals the parameter that the statistic estimates. |

unexplained variation: | The sum of squared residuals |

X: | The symbol for explanatory variable. |

xbar -chart: | A plot of sample means over time used to assess whether a process is in control. |

Y: | The symbol for response variable. |

yˆ : | The symbol for predicted y. |

z-score: | A measure of the number of standard deviations of a value or observation from the mean. |

Created by:
davidkentclark