click below
click below
Normal Size Small Size show me how
FQ1
Formula Quiz 1
| Question | Answer |
|---|---|
| individual | an object described by a set of data |
| variable | a characteristic of an individual |
| quantitative variable | a variable that takes on a numerical value that can be measured |
| quantitative data | values of quantitative variables |
| categorical/qualitative variable | a variable that places an individual into a category |
| distribution | indicates what values a variable takes on and the frequency at which it takes these values |
| graphs of qual. | pie chart, bar chart |
| bar chart | qual. |
| pie chart | qual. |
| graphs of quan. | dotplot, histogram, stemplot |
| dotplot | quan. |
| historgram | quan. |
| stemplot | quan. |
| outlier | an individual observation that falls outside the overaell pattern of the graph |
| relative frequency histogram | has the same shape as a histogram with the exception that the vertical axis measures relative frequencies instead of frequencies |
| key features of a histogram | the center (mean, median), the spread (range), the shape |
| shapes of a graph. | symmetric, skewed left, skewed right |
| measures of center | sample mean, mode, median |
| sample mean | arithmetic average or arkithmetic mean |
| mode | element or elements that occur most often |
| median | "the middle number"/average of two middle numbers |
| median position formula | (n+1)/2 (n=number of numbers in the data set) |
| mean=median when... | distribution is perfectly symmetric |
| when it is skewed right | the mean is dragged to the right |
| when it is skewed left | the median is dragged to the left |
| measures of spread | range, iqr, five number summary, the variance and the sample standard deviation |
| range | largest #- smallest # |
| iqr | IQR= Q3-Q1 |
| five number summary | min. q1 med q3 max |
| variance | sum of (xi-the mean)squared divded by n-1 |
| standard deviation (equation) | square root of variance...sum of (xi-average)square/n-1 |
| standard deviation (definition) | the st. dev. is a set of numbers that emasures how numbers are spread out from the mean |
| xi-xbar | a deviation of xi from the mean |
| the sum of all the deviations of the mean | always equals 0 |
| st. dev. is .... to outliers | nonresistant (is affected by) |
| n-1 is.. | degrees of freedom |
| a datapoint is an outlier if... | it lies mroe than one a half iqr ranges before q1 or above q3 |
| boxplot | is a graph which displays five num summary of a set of data |
| modified boxplot | a graph that displays the fiver numeb summary of a data set (tests for outliers) |
| side-by-side boxplots | can be used to compare the distributions of to data sets |
| within one standard deviation of the mean | 68% of the data will fall |
| two sample standard deviations from the mean | about 95% of the data will fall |
| three devations fromt he mean | about 99.7% of the data wqill fall |
| z-score | meansures how far these points lie from the mean (using standard devations as the unit) |
| equation for z-score | x-xbar/s |
| sample mean of a z-score is | 0 |
| the sample dev of a zcore is | 1 |
| cumulative frequency | is the nunber of observations less than or equal to a given number |
| cumulative relative frequency | cumulative frequency divded by the toal number ofobservations |
| empirical distribution function | is a graph of the cumulative relative frquency vs. the raw data in the sample |
| a density curve | a curve that always lies on or above the horizontal axis and has area exactly of 1 underneath |
| median of a density curve | is the point that divides the area under the curve in half |
| mean of a density curve | the point at which the curve would balance if it was made of a solid material |
| the standard normal distribution is..(mean/st. dev.) | a normal distribution with mean 0 and standard deviation 1 |
| conversation formula is used | to convert normal distribiton values to standard normal distribution values |
| conversation formula (actual form) | z= (x-mu)/s |
| what does a z-score measure | the number of standard deviations between anobservation x and the mean mu of the data set |
| normal quantile plot | graphs raw data (horizontal) versus their z-score (y-axis) |
| a data set is approximately normal when its | quantile plot is approximately linear |
| independent variable x is | the explanatory varaible |
| dependent variable y | response variable |
| directions of scatterplots | positive association, negative association or neither |
| scatterplots are analyzed according to: | direction, form, strength of relationship, and outliers |
| correlation coefficient measures | the direction and strength of the linear relationship between two quantitative variables |
| formula for r | r= one over n minus 1 times the sum of the (xi-x) divided by sx and (yi-y) divided by sy |
| the correclation coefficient r is always a number between | -1 and 1 |
| if r is positive then | x and y have a posistive association |
| if r=1 then | x and y have a perfect positive correlation |
| if r is negatrive then | x and y have a negative association |
| if r=-1 then | x and y have a perfect negative correlation |
| least squares regression lineis the equation.. | of the line that makes the sum of the squares fof the residuals as small as possible |
| equation for the LSQR | yhat=bnaught+b1x |
| bnaught is.. | y intercept |
| b1 is... | the slope of the line |
| equation for b1 | r(sy/sx) |
| equation for bnaught | ybar-b1xbar |
| ybar | the mean of the y coordinates |
| x bar is | the mean of the x coordinates |
| the difference between y and yhat is called | an error or a residual |
| residual is | the observed value of y mins the predicted value of y (y-yhat) |
| the point xbar, ybar... | is a point on every regression line |
| rsquared is | called the coefficient of determination |
| rsquared measures | the variation in y that is explained by y's linear association with x |
| a residual plot graphs.. | the residuals on the vertical axis and either the explanatory, response or preodicted response values on the horizontal |
| residuals from a LSQR always | have a mean of 0 |
| the horizxontal axis of a residual plot | corresponds to the regresson line |
| an observation is influential if | removing it would markedly change the position fot her egression line |