click below
click below
Normal Size Small Size show me how
AP Stats Exam review
| Question | Answer |
|---|---|
| Categorical Data | Typical word description ex. Color, political party, gender |
| Quantitative data | Data that is numerical; has an average, can talk and the spread; can be graphed. |
| Bar Charts | For categorical data, use a y-axis; graph the totals or percentages |
| Pie Charts | For categorical data, breaks down one total into subgroups, adds to 100%, always percentages |
| Misleading Graphs | Used to exaggerate results by having the y-axis start at a convient point |
| Maginal Distributions | The numbers in the margins , of a two-way table, out of a total. |
| Segemented Bar Graph | A bar graph where various categories are spread amoung multiple groups |
| Simpsons Parodox | When subgroups can show one relationship yet overall data shows a reverse relationship. When subgroups are unbalanced |
| Stemplots | Like a histogram but still shows actual values. Can do a split stem plot which is two distributions |
| Dotplots | Works well for small data sets. Having dot of each point in relation to it on a below line. |
| Distrubution | A way of describing a varaible by what value it takes on and with what frequency |
| SOCS | S: Shape, of the graph (symetric, bimodal, skewed, uniform) O: Outliers, any unusual values C: Center, mean median S: Spread, five number summaries |
| Skewness | AKA strected either left or right |
| Histogram | Similar to bar graph but each value goes in the bar directly to the left of it. |
| Percentiles | THe percentage of data at or below a particular value |
| Effect of Shape on Center | Symetric: mean = median: skewed left: mean is less than median; Skewed Right: mean is greater than median |
| Five number Summary | Min, Q1, median, Q3, Max |
| Interquartile Range | IQR is Q3-Q1 |
| Bosplots | A graph of the five number summary |
| Boxplot Outlier Rule | Anything above Q3+1.5(IQR) or below Q1-1.5(IQR) |
| Standard Deviation | Roughly estimates how far, on average, the data values are from the mean |
| Resistant Statistics | It´s resistant if the inclusion of an outlier (or strong skewness) will not or barely affect it |
| Variance | Is equal to (SD)² |
| Probability Distributions | Like a distribution but the frequencies that the values of that variable take on are expressed as proportions |
| Ogives | AKA Cumulative frequency graphs Shows how much of the data is at or below a particular data value |
| z values | The number of standard deviations a data point is from the mean AKA standardize scores |
| Linear Transformations | New data = A+B(original data) or y=a+bx. If you add a constant to "a" or to the data set only the measures of center and location change. If you multiply by a constant "b" then all summary measures change |
| Density Curves | A graph when the curve is completely above the x-axis and area=proportion so total area under the curve =100% |
| Discrete | If the values can be counted or listed completly |
| Continuous | If the values are uncountable or have an innumerable amount of possibilities; unlimited outcomes |
| The normal distribution | A symmetrical density curve, Measures the proportion of data per interval of standard deviation |
| The empirical rule | In a normal distributions is the percent of data incompassed by +- each SD from the mean: 68%, 95%. 99.7% |
| The standard normal distribution | Has a mean of sero and a SD of one |
| Scatterplot | Shows relationships between two quantitative variables by pairing two related values as coordinates and graphing them |
| Explanatory Variables | AKA input; helps predict the response variable, goes on the x-axis |
| Response Variable | AKA Output; the variable which is a result of the input or explanatory varible, goes on the y-axis |
| Graph variable one against variable two means | Variable one is on the y-axis and variable two is on the x-axis |
| DOSS | D: Direction, Upward? Downward? Postive or negative? O: Outliers, unusual points S: Shape, Linear? Curved? S: Strength, Weak? Strong? Modereate? |
| R-value | A unitless number that simply describes the strength and direction of the relationship between two quantitative variables. |
| Correlation vs. Causation | They may be related for confounding variables |
| Correlation | Is for two quantitative varibles |
| Association | For categorical variables |
| Interpret Slope | For each additional one unit increase (x), the predicted change in (y) will be (slope) |
| Interpret y-int | If x is zero then we predict that y will be y-int |
| Least Squares Regressions line | AKA LSRL. The one line that makes the sum of all these squares as small as possible |
| Residuals | The differnces between the LSRL and the actual y coordinates |
| Residual Plot | A scatterplot that pairs the residuals with the x-values of the data. Shows how large the residuals are (errors) |
| R-squared | The amount of variation in y that is explanied by the linear relationships with x |
| Interpret r-squared | (r²) % of variation can be explained by the linear relationship with x |
| On the scatterplot; Outlier | Is any point that has a large residual after the line has been computed for the data |
| On a scatterplot; Influential point | Any point that if taken out would have a large effect on the slope or coorelation |
| Statistics | Any value that describes or summarizes a sample |
| Parameter | Any value that describes or summarizes a population |
| Census | The data is the entire population |
| Sampling Frame | The part of the population from which the smaple was actually drawn |
| Bias | A sampling method is biased if we suspect the method used will produce estimates that are predictable compared to the popluation |
| Bias is not a sample ____ issue, but a sampling ____ issue | Size; method |
| Voluntary Response Bias | Which subjects aren't randomly chose, but rather subject choose if they will provide data. |
| Non response Bias | Has random selection but a meaningful proportion of the population wasn't sampled |
| Response Bias | When the data itself is highly suspect or inaccurate (ex. illegal activity) |
| Under coverage Bias | When the sampling method never gives a subgroup a chance to be in the sample |
| Convenience Sample | Using data that is simple or easy to gather |
| Simple Random Sample | SRS; Each item in the population has the same chance of being selected. All groups are equally possible |
| How to SRS | 1) Assign each item in the population a number. Then mix them up then draw out the desired number of subjects OR 2) Use a random number table or use a calc. |
| Stratified Sample | Use if you think there might be a confounding variable associated with the variable of interest. Group subjects into "strata"in the proportion that they make up the population |
| Stratified samples have ___ variation from sample to sample than SRS | Less |
| Cluster Sampling | Break down the population into "clusters" of mixed subgroups, then randomly select which clusters to sample from. |
| Systematic Sample | Where you sample every kth item. After you randomly select the first number, |
| Observational Study | You compare two or more groups according to some explanatory variable (s) and measure a response variable to "observe"the differences. |
| You ____ deduce a cause and effect relationship between the variables in an observational study, | Cannot |
| Experiments | We control the application of the explanatory variables through random assignment of treatments of subjects. |
| We ___ conclude a cause and effect relationship between variables in an experiment. | Can |
| Retrospective Observational Study | Find subjects with desired response variables and look back to see how they differ from the explanatory variables |
| Introspective Observational Study | Find subjects with desired explanatory variables and follow into the future to see how they differ from the response variables |
| Good Experiments | Have more than one treatment, random assignments, control of other variables and replication |
| Experimental Units | Whatever are being randomly assigned to the treatments of the explanatory variables |
| Treatments | Specific levels or combos of all levels of factors in an experiment |
| Control Group | Establish a baseline; measure the placebo effect |
| Placebo | A fake treatment to see if the mere participation in the experiment produces change in behavior |
| A placebo is ____ than just a control group that gets nothing | Better |
| Blind | Subjects don't know which treatment they get |
| Double-blind | Researchers and subjects don't know who gets what treatment |
| Block Design | Subjects are grouped by known similarities |
| Matched Pairs | Paired with yourself or pairs of similar subjects |
| Completely Randomized | No grouping ahead of time; subjects are randomly assigned to treatment groups |
| Confounding varaibles | A variable associated with the explanatory variable that may help explain association or cause it |
| Cross-over matched pairs | Assigned to yourself |
| Statistically Significant | More than what would reasonably happen just due to chance or less than alpha |
| P(A and B) = | P(A) x P(B|A) |
| P(B|A) = | P(B) is A and B are independent |
| P(A or B) = | P(A) + P(B) - P(A and B) |
| P(A and B) = | 0 is A and B are mutally exclusive |
| Independence | Two events are independent if the probability of the second event is unchanged regardless of whether the first event is happening |
| sample space | Listing all the possible outcomes, the sum of all probabilities of the event is one, all the probabilities are between 0 and 1 |
| probability distribution | including the probability of each event in your sample space |
| Complements | Everything in the sample space besides that event |
| Conditional Probabilites | P(B) vs. P(B|A) Basically a probability that the factors in extra information, or an additional "condtion" |
| Probability of A or B | P(A or B) = P(A) + P(B) - P(A and B), also written as P(A U B) |
| Discrete Variables | Can take on a finite number of outcomes; they can be listed |
| Continuous Variables | Can take on an infinite number of outcomes, best described by intervals rather than specific outcomes |
| You can only combine variances for ______ ______ | independent variables |
| Binomial Distributions conditions | 1) exactly two outcomes of interest for each trail, 2) a fixed number of trials, 3) same probability of success for each trail, 4) each trial is independent of each other |
| Binomial PDF | Probability of exactly k successes out of n trials |
| Binomial CDF | Probability of k or less success out of n trails |
| 10% condition | If the sample is less than 10% of the population then sampling without replacing is "close enough" to the same probabilty of success on each trial |
| Large Counts Condition | If np and n(1-p) are both greater than or equal to 10 |
| Geometric distributions | Have a fixed probabilty of success, each trial is independent, exactly two outcomes of interest BUT we're interested in our first sucesss being on a certain trial |
| Sampling Distributions | Describes the set of all possible values a statistic can have for a given sample size. They are very predictable in the long run even though each sample statistic is unpredictable |
| Statistics are said to be _____ is in the long run they average out to equal the population parameter | unbiased |
| The _____ (SD) of a sample statistic is only dependent on the sample size, not the population size | variability |
| A sampling distribution is ____ ____ if np and n(1-p) is greater than or equal to 10 | approx. normal |
| Central Limit Theorem | A sampling Distribution is approx. normal when n is greater than or equal to 30 |
| Confidence Intervals | Estimate populations parameters (means, proportions) by giving an interval where we believe the parameter might lie and how confident we are about it |
| Significance tests | access the weight of the evidence against the population parameter being ture |
| We assume our hypotheses are ___ unless we find significant evidence to the contrary | true |
| Point estimate | Giving your sample statistic as best specific estimate of an unknown population parameter |
| General Formula for z-int | Statistic + - multiplier (SE) |
| Muilipier | Same as critical value |
| Interpret a confidence interval | We can be % confident that the actual (parameter of interest) lies between __ and ___ (units) |
| Conditions for z-int | 1) random sample, 2) population greater than or equal to 10, 3) at least 10 successes and 10 failures |
| ACDC | A: Announce, What test? Ho? Ha? P? M? Alpha? C: Conditions, Known or assumptions? D: Do, Procedure? P=? df? z? t? C: Conclude, Interpret? Reject Null? Sig Evidence? |
| Conditions for t int | 1) random sample, 2) population greater than or equal to 10 |
| The normal Condition | Either the population is known to be normal, n is greater than or equal to 30 or we graph the data and there is no strong skewness or outliers |
| We use a t-int when there are __ random variables | 2 |
| Degree of freedom | DF, for one sample methods is df=n-1 |
| Samples ____ prove any hypothesis to be true or false | cannot |
| p-value | The probabilty of getting your test statistic or more extreme, assuming the null is true |
| If the p-value is small | reject the null |
| Null Hypothesis | Ho, always has = |
| Alternate hypothesis | Ha, will involve an inequality sign |
| The ___ the p-value, the more significant the statistical evidence is against the null hypothesis | smaller |
| Type one error | The null is actually true but we reject it |
| Type two error | the alternate is true but we do not conclude that |
| power of a test | The probabilty of not making a type 2 error. So the alternate is true and we conclude that |
| one-proportion-z-test conditions | 1) random sample, 2) sample is less than 10% of population, 3) np is greater than 10, n(1-p) is greater than 10 |
| The effect of a two tailed test is | typically double the p-value |
| conditions for one-proportion t-test | 1) random sample, 2) sample is less than 10% of population, 3) approx. normal or n is more than 30 |
| Matched pairs t-test | Just like a t-test but on the differences between each pairs. Significant evidence that the mean difference is not zero |
| Conditions for 2-prop-z-int | 1) each is a random sample, 2) each population is greater than 10%, 3) each sample has at least 10 success and 10 failures |
| Conditions for 2-prop-z-test | 1) each random sample, 2) each population greater than 10 times n, 3) at least 10 success and 10 failures per sample |
| Pooled proportions | GIves the best estimate of the actual proportion |
| Degree of freedom for 2-sample t-int | is n1+n2 -2 |
| Goodness of fit test | We take one sample and look at the distribution of a single categorical variable |
| Homogenous distribution test | We take samples from several populations and categorize data by a single variable |
| Test of Independence | We take one sample and then classify data by two categorical variables |
| GOF Hypothesis | Null is the proposed distribution is correct. Alt is the distribution is incorrect |
| Conditions for all x² tests | 1) random sample rule, 2) 10% condition, 3) large colunts rule (greater than 5) |
| x² tests use ___ to show what should be true is the distributions are correct | expected values |
| Df for x² | Categories - 1 |
| Hypothesis for x² test of homogenous distributions | Null is the distributions of your categorical variables are the same for each population. Alt is the distributions are not the same |
| X² test of independence hypothesis | Null is no association, Alt is association |
| Hypothesis test for the slope | Null, there is no linear relationship between x and y Alt, there is a linear relationship |
| A slope of zero signifies | no linear relationship |
| Linear relationship between quantitative variables Conditions | 1) random sample, 2) 10% condition, 3) approx, normal residuals |
| Df for linear regression relationship | n-2 |
| Linear regression relationship formula | stat +- multiplier (SE) |