click below
click below
Normal Size Small Size show me how
Descriptive Stats
| Term | Definition |
|---|---|
| n | sample size |
| N | population size |
| Kth | how often selected (every 5th) |
| M | sample median |
| P🔻50 | population median |
| "mu" | population median |
| - x | sample mean |
| o- "sigma" | population standard |
| s | sample standard deviation |
| o- ^2 "sigma" | population variance |
| s^2 | sample variance |
| Q1 & Q3 | quartiles |
| Data | observed values of variable(s) measured on some sets of individuals |
| Individuals | the objects (people, things, etc.) described by the data set |
| Variable | a measured characteristic of an individual |
| Population | entire group of individuals that we want information on |
| Parameter | numerical summary of population |
| Sample | subset of the population that we actually examine to gather information |
| Statistics | numerical summary of a sample |
| Response variable | outcome under study in a statistical analysis |
| Lurking variable | variable that has important effect on the relationship among variable in the study, but is not included among variables studied |
| Observational study | measures characteristics of the population by studying individual samples with no manipulation |
| (designed) experiment | treatment of individuals in a controlled setting, attempting to isolate effects on response variable |
| Nominal | if the values cannot be ordered |
| Ordinal | if the values can be ordered |
| Discrete | has a countable number of possible values |
| Continuous | has an infinite number of possible values that are not countable |
| Interval | if differences between the values of the variables have meaning (addition and subtraction work, but 0 does not mean an absence) |
| Ratio | if ratios between the variables have meaning (multiplication and division work, zero means the absence of a property) |
| Descriptive statistic | organization and summary of information collected (plot, table. numerical, etc.) |
| Inferential statistic | estimates/predicts or some other generalization about population based on information contained in sample |
| Categorical | If addition or subtraction works on a variable then it is quantitative and vise vera. |
| Asymmetric | not symmetrical |
| Convenience sample | choosing the people easiest to reach, no justification, not representative |
| Voluntary response/self-selection sample | people who choose themselves by responding, usually biased |
| Simple random sample (SRS) | of size n consists of n individuals selected from population where every set of n individuals has an equal chance of being selected |
| Stratified sample | separating the population into nonoverlapping groups - strata. Then obtaining a SRS from each stratum |
| Systematic sample | selecting every kth individual from the population, where the 1st is randomly selected from the first k individuals |
| Cluster sample | selecting all individuals within a randomly selected collection or group of individuals |
| Bar graphs | categorical variables, names listed on horizontal axis, height is the count, spaces in between bars |
| Pie chart | % area of a pie piece represents that categories % of that sample |
| Skewed to the right | tail extends to the right, mean is greater than the median |
| Skewed to the left | tail extends to the left, mean is less than the median |
| Aprox. symmetric | mean and median are about the same |
| Histogram | vertical axis is frequency percents, horizontal axis is the scale of the variables |
| Lower class limit | smallest value within an interval/class |
| Upper class limit | largest value withing an interval |
| Class width | difference between consecutive lower class limits |
| Range | max-min=range |
| standard deviation | the distance a "typical" observation falls from the mean |
| Variance | square of standard deviation |
| Quartiles | values that cut off the middle 50% of the data |
| Empirical rule | 68% of observations fall withing one standard deviation of the mean 95% fall between 2 SD of the mean 99.7% fall between 3 SD of the mean |
| Z-score | measures relative standing by determining the # of SD an observation falls from the mean |
| Percentiles | measure relative standing as a %. kth percentile cuts off the lowest k% of observations from the highest (k-100)% |
| 5-number summary | brief numerical description of the center and spread of a distribution: min., Q1, M, Q3, max. |
| outliers | points that fall outside the bounds - calculated with IQR or x greater than or less than "Mu" (+or-) 2sigma |
| extreme outlier | falls much further from the bounds is 3 sigma away |
| r | correlation |
| yi | observed y-value for ith observation |
| ^ yi | predicted y-value for ith observation |
| residual | calculated by subtracting observed y by predicted y - want them to be as small as possible |
| b1 | slope |
| b0 | intercept |