Term | Definition |
association: | Values of one variable tend to occur with certain values of another variable; detected when the
conditional distributions differ from the marginal distribution and from each other. |
bias: | A condition where the mean of the statistic values differs from the parameter that the statistic estimates. |
bivariate data: | Data collected on two variables for each individual in a study. |
Central Limit Theorem: | The name of the statement telling us that the sampling distribution of x is approximately normal whenever the sample is large and random. |
conditional distribution: | The distribution of the values in a single row (or a single column) of a two-way table. |
control chart: | A statistical tool for monitoring the input or output of a process. |
control limits: | μ − 3 σ/srn and μ + 3 σ/srn ; used to detect out-of-control signals in a control chart. |
correlation coefficient: | A measure of the strength of the linear relationship between two quantitative variables. |
disjoint events: | Events that cannot occur simultaneously. |
distribution of a variable: | A list of the possible values of a variable together with the frequency of each value. (Note: probabilities can be given instead of frequencies.) |
event: | A single outcome or a combination of outcomes from a random phenomenon. |
extrapolation: | Predicting a Y value using a value of X that is outside of the range of X values used to obtain the regression equation. This prediction could be very far off. |
inference: | Using results from a sample statistic value to draw conclusions about the population parameter. |
influential observation: | An observation that substantially alters the values of slope and y-intercept in the regression equation when it is included in the computations |
law of large numbers: | The fact that the average ( x ) of observed values in a sample will get closer and closer to μ as the sample size increases. |
laws of probability: | The basis for hypothesis testing and confidence interval estimation. |
least squares: | A method for finding the equation of a line that minimizes the sum of squared residuals. |
least squares regression line: | The line with the smallest sum of squared residuals. |
lurking variable: | A variable that is not measured but explains association between two variables that are measured. |
marginal distribution: | The distribution of the values in the “total” row (or the “total” column) of a two-way
table. |
mean of the sampling distribution of x | the mean of all the sample means ( x =s) from all possible samples of
size n from a population; equals μ |
μ: | The mean of the population |
no association: | A condition where values of one variable occur independent of values of another variable;
detected when the conditionals of a two-way table equal the marginal distribution (and each other) |
out-of-control process: | One sample mean outside three standard deviations of x or nine sample means in a row above or below the center line. |
outlier: | An observation that falls outside the overall pattern of the data set. |
parameter: | A characteristic of a population that is usually unknown; this could be mean, median, proportion,
standard deviation computed on all the data from the population.; a parameter does not have variability. |
parameter symbols: | μ, σ, and p (mean of population, standard deviation of population, proportion of a population, respectively) |
positive association: | High values of one variable tend to associate with high values of another variable. |
probability of an outcome: | A measure of the proportion of times an outcome occurs in a very long series of
repetitions that gives us an indication of the likelihood of the outcome. |
process: | Sequence of operations used in production, manufacturing, etc. |
process in statistical control: | A process whose inputs and outputs exhibit natural variation when observed over time. |
quality control chart: | A chart plotting the means x of regular samples of size n against time; this chart is used to access whether the process is in control. |
quantitative bivariate: | The type of data required for regression analysis. |
r: | The symbol for correlation coefficient. |
r2: | The percentage of total variation in the response variable, Y, that is explained by the regression equation; in other words, the percentage of total variation in the response variable, Y, that is explained by the explanatory variable, X. |
random: | A phenomenon that describes the uncertainty of individual outcomes but gives a regular distribution of the outcomes in the long run. |
regression equation: | A formula for a line that models a linear relationship between two quantitative variables. |
residual: | The observed y minus the predicted y; denoted: y - yˆ |
residual plot: | A diagnostic plot of the explanatory variable versus the residuals used to access how well the regression line fits the data; |
sample mean xbar : | The random variable of the sampling distribution of xbar . |
sample space: | The list of all possible outcomes of a random phenomenon. |
sampling distribution: | A distribution of a statistic; a list of all the possible values of a statistic together with the frequency (or probability) of each value. |
sampling distribution of xbar : | A list of all the possible values for x together with the frequency (or probability)
of each value; in other words, the distribution of all x ’s from all possible samples. |
sampling variability: | The variability of sample results from one sample to the next; something we must measure in order to effectively do inference. |
scatterplot: | A two dimensional plot used to examine strength of relationship between two variables as well as direction and type of relationship. |
Simpson's paradox: | A condition where the percentages reverse when a 3rd variable is ignored. a condition leading to misinterpretation of the direction of association between 2 variables caused by ignoring a 3rd variable that's associated with both of the reported variables. |
simulation: | Using random numbers to imitate chance behavior. |
slope: | A measure of the average change in the response variable for every one unit increase in the explanatory or independent variable. |
standard deviation (s): | A measure of the variability of data in a sample about xbar . |
standard deviation of xbar (also called the standard deviation of the sampling distribution of xbar ): | A measure of the variability of the values of the statistic x about μ; a measure of the variability of the sampling distribution of x ; in other words, the average amount that the statistic, x, deviates from its associated parameter. Computed as σ /SRn |
statistic: | A number computed from sample data (without any knowledge of the value of a parameter) used to estimate the value of the parameter. |
statistic symbols: | xbar , s, pˆ (mean of sample, standard deviation of sample, proportion of sample, respectively) |
statistical process control: | A procedure used to check a process at regular intervals to detect problems and correct them before they become serious. |
sum of squared residuals (or error): | the residuals are squared and added; denoted SSE. |
total variation in Y: | The sum of the squared deviations of the Y observations about their mean, y . |
two-way table: | A table containing counts for two categorical variables. It has r rows and c columns. |
unbiased: | A condition where the mean of the statistic values equals the parameter that the statistic estimates. |
unexplained variation: | The sum of squared residuals |
X: | The symbol for explanatory variable. |
xbar -chart: | A plot of sample means over time used to assess whether a process is in control. |
Y: | The symbol for response variable. |
yˆ : | The symbol for predicted y. |
z-score: | A measure of the number of standard deviations of a value or observation from the mean. |