click below
click below
Normal Size Small Size show me how
Stats
| Term | Definition |
|---|---|
| Statistics | A collection of procedures and principles for the gathering and analyzing data to help make decisions when faced with uncertainty A summary measure computed from sampled data |
| Descriptive Statistics | State facts and proven outcomes from a population |
| Inferential Statistics | Analyze the sample findings to make predictions about the larger population |
| Population | The entire group of interest |
| Sample | a subset of the population |
| Census | Measure every unit in the population Can't always take this because of time, effort, and money (resources) |
| Quantitative variable | variables that measure a measurement or count (you take the average of this variable) Ex - age, weight, shoe size, number of siblings, points scored, assists per game, views on Tik-Tok |
| Categorical variables | an individual is either in one category or in one of the others Ex- Favorite pizza store, team name, school name, grade, city(state), race/ethnicity, gender |
| Parameter | a summary measure of the an entire population The true mean/proportion |
| Observational Studies | passive studies, with the researcher’s goal to observe conditions in the past, present, or the future without any interference in the process that is generating the information. |
| Experiments | Active studies, with the researcher’s goal being to manipulate experimental conditions to study their effect on an outcome |
| Surveys | Designed observational study where data is collected at a particular point of time |
| Discrete Data | Data is numerical and the number of values is finite or “countable” Examples: A biology professor counts the number of students in attendance A coach counts the number of players on the injury report |
| Continuous Data | Data that is numerical and results from many possible quantitative values, where the collection of values is not countable Examples: Weights of Patients before starting a weight loss program Lengths of Burmese pythons in Florida |
| Nominal Data (categorical data) | Data that consist of names, labels and categories only The data cannot be arranged in some order (low to high) Yes/no/undecided Surveys |
| Ordinal Data | Data that can be arranged in order but the differences (subtraction) either cannot be obtained or it does not matter 0-10 on surveys (pain tolerance, customer service, ratings) Course grades Educational Level |
| Interval Data | Data that can be arranged in order and the differences between the data can be found and are meaningful There is no “true zero” at which none of the quantity is present Temperatures, SAT and ACT score, IQ, years |
| Simple Random Sample | Every group of units from the population has the same chance of being selected Example: Obtain a list of Phillies season ticket holders. Use a computer or random number generator to randomly select numbers in a sample |
| Stratified Random Sample | Divide the populations in common groups and take a SRS (simple random sample) from each group(strata) Example: Obtain a SRS from first year, sophomore, junior and senior Arcadia students |
| Cluster Random Sample | Divide the population into clusters. Randomly select clusters and sample all units within the clusters only Example: Obtain a list of all Arcadia classes. Randomly choose a few of these classes and sample every student in these chosen classes |
| Systematic Sample | Divide the sampling frame into consecutive ordered segments Choose the same ordered unit from each segment (take every 5th) |
| Voluntary Response Sample | Only include whose who elect to respond |
| Convenience Sample | Use the most convenient group available |
| Selection Bias (undercoverage) | Produces a sample that does not represent the population of interest Examples: Population of interest is all Philadelphia Eagles fans. Obtain a SRS off a list of fans in South Philadelphia Only survey grocery shoppers at 12 pm |
| Non Response Bias | Representative sampling frame may be chosen but a subset cannot be contacted or does not respond Examples: Political polls Gallup Polls |
| Response Bias | Participants provide incorrect information Asking a survey about age of first kiss How amazing was this product? |
| Voluntary Response Bias | only received answers from those who choose to respond |
| Response Bias | Unclear Questions Hard to answer Questions Interviewee uncomfortable answering questions |
| Ratio Data | Data that can be arranged in order, differences are meaningful, and there is a true zero Heights of students, class times (length in minutes), number of children, income levels |
| Types of Observational Studies | Retrospective, Prospective, Surveys, and Cross-sectional studies |
| Retrospective (case-controlled) | Subjects identified and then data collected from their past |
| Prospective (longitudinal) | Subjects identified and then data is collected in the future for these subjects |
| Cross-Sectional Study | Data are measured at one point in time Ages of randomly selected adult males with heart disease |
| Experiments | Active Studies with the researcher’s goal being to manipulate experimental conditions to study their effect on the outcome Factor, Response, Factor Levels,Treatment, Control Treatment, Measurement Unit, Experimental Unit, and Replication |
| Factor | variables controlled by the researcher |
| Response | variables which is thought to depend |
| Factor levels | the different settings of the factor |
| Treatment | the different combinations of factor levels |
| Control treatment | benchmark treatment to which the effectiveness of remaining treatments are compared |
| Measurement Unit | physical entity on which the measurement is taken |
| Experimental Unit | physical entity to which the treatment is randomly assigned |
| Replication | A single application of the treatment to an experimental group (does not have to be the same within each group) |
| Bad experiments in history | Tuskegege Syphilis Experiment • Henrietta Lacks • Mississippi Appendectomy Program • Nazi Human Experiments • Stanford Prison Experiment • Monster Study |
| Blinding | conducting an experiment so that the participants do not know what treatment was assigned Single Blind: subjects are blinded or Double Blind: researcher is also blind |
| Placebo | a fake treatment that is not distinguishable from the real “treatment” being tested They are the best way to blind subjects |
| Placebo effect | tendency among humans to have a change in response when taking any treatment Note: You can have a placebo-controlled group |
| Blocking | When groups of experimental units are similar, it’s often a good idea to gather them together into blocks It isolates the variability due to the difference in the blocks so that we can see the differences due to the treatments more clearly |
| Completely Randomized Design | An experiment where the treatments are randomly assigned to the experimental units ex- Athletes randomly assigned warm-up routines |
| Factorial Treatment Design or (Completely Randomized Factorial Design) | An experiment with two or more factors where the treatments are formed by combining levels of the factors ex- Police Patrol strategy and shift length to study response time |
| Randomized Block Design | An experiment where a completely randomized design is run within each block ex- Training tested after blocking by years of experience |
| Matched Pair Design | Special case of randomized block design Used when experiment has only 2 treatment conditions&subjects can be grouped based on some blocking variable W/in each block, subjects r randomly assigned diff treatments ex-Athlete shoots w/ and w/o music |
| Confounding variables | variables that have an association with two variables of interest that tempts us to think that one of these variables of interest may cause the other |
| Important final thoughts about experiments and observational studies | Correlation doesn't equal causation Biggest Problems -Experiments: we can’t always do them -Observational Studies: Confounding Variables |
| Random assignment | Randomly assigning individuals to groups Allows inference about cause and effect |
| Random selection | Randomly selecting individuals into study Allows inference about population |
| Measures of Central Tendency | Mean, median, and mode |
| Skew right | mean > median Variables like income, time based metrics, and count data |
| Skew left | mean < median Variables like rating scales, age at death in developed countries, completion rates, professional experience levels |
| Measures of Variation | Range, Standard deviation, IQR (Q3-Q1 to find outlier) |
| IQR | Q3-Q1 1.5*IQR Q1 - (1.5*IQR) = lower fence Q3 + (1.5*IQR) = upper fence Any number upper fence or lower than lower fence is an outlier |
| Pareto graph | Bar graph arranged in descending order (highest -> lowest) |
| Pie chart | Depicts the the categorical data as slices in a circle in which the size of each slice is proportional to the frequency count for the category Problem - sometimes slices can become seemingly invisible |
| Histogram | Graph consisting of bars of equal width drawn adjacent to each other he horizontal scale represents the classes of quantitative data values, and the vertical scale represents frequencies. The heights of the bars correspond to frequency values |
| Time Series graph | a graphs of time series data, which are numerical that have been collected at different points in time, such as monthly or yearly Reveals info about trends over time |
| Frequency polygon | uses line segments connected to points located directly above the class midpoint values. It’s similar to a histogram but it uses line segments instead of bars |
| Cumulative Relative Frequency | line graph of cumulative relative frequency that illustrates the percentile for each data value or class interval |