Typical word description ex. Color, political party, gender

Data that is numerical; has an average, can talk and the spread; can be graphed.

For categorical data, use a y-axis; graph the totals or percentages

For categorical data, breaks down one total into subgroups, adds to 100%, always percentages

Used to exaggerate results by having the y-axis start at a convient point

Maginal Distributions

The numbers in the margins , of a two-way table, out of a total.

A bar graph where various categories are spread amoung multiple groups

When subgroups can show one relationship yet overall data shows a reverse relationship. When subgroups are unbalanced

Like a histogram but still shows actual values. Can do a split stem plot which is two distributions

Works well for small data sets. Having dot of each point in relation to it on a below line.

A way of describing a varaible by what value it takes on and with what frequency

S: Shape, of the graph (symetric, bimodal, skewed, uniform) O: Outliers, any unusual values C: Center, mean median S: Spread, five number summaries

AKA strected either left or right

Similar to bar graph but each value goes in the bar directly to the left of it.

THe percentage of data at or below a particular value

Effect of Shape on Center

Symetric: mean = median: skewed left: mean is less than median; Skewed Right: mean is greater than median

A graph of the five number summary

Anything above Q3+1.5(IQR) or below Q1-1.5(IQR)

Roughly estimates how far, on average, the data values are from the mean

It´s resistant if the inclusion of an outlier (or strong skewness) will not or barely affect it

Probability Distributions

Like a distribution but the frequencies that the values of that variable take on are expressed as proportions

AKA Cumulative frequency graphs Shows how much of the data is at or below a particular data value

The number of standard deviations a data point is from the mean AKA standardize scores

Linear Transformations

New data = A+B(original data) or y=a+bx. If you add a constant to "a" or to the data set only the measures of center and location change. If you multiply by a constant "b" then all summary measures change

A graph when the curve is completely above the x-axis and area=proportion so total area under the curve =100%

If the values can be counted or listed completly

If the values are uncountable or have an innumerable amount of possibilities; unlimited outcomes

The normal distribution

A symmetrical density curve, Measures the proportion of data per interval of standard deviation

In a normal distributions is the percent of data incompassed by +- each SD from the mean: 68%, 95%. 99.7%

The standard normal distribution

Has a mean of sero and a SD of one

Shows relationships between two quantitative variables by pairing two related values as coordinates and graphing them

Explanatory Variables

AKA input; helps predict the response variable, goes on the x-axis

AKA Output; the variable which is a result of the input or explanatory varible, goes on the y-axis

Graph variable one against variable two means

Variable one is on the y-axis and variable two is on the x-axis

D: Direction, Upward? Downward? Postive or negative? O: Outliers, unusual points S: Shape, Linear? Curved? S: Strength, Weak? Strong? Modereate?

A unitless number that simply describes the strength and direction of the relationship between two quantitative variables.

Correlation vs. Causation

They may be related for confounding variables

Is for two quantitative varibles

For each additional one unit increase (x), the predicted change in (y) will be (slope)

If x is zero then we predict that y will be y-int

Least Squares Regressions line

AKA LSRL. The one line that makes the sum of all these squares as small as possible

The differnces between the LSRL and the actual y coordinates

A scatterplot that pairs the residuals with the x-values of the data. Shows how large the residuals are (errors)

The amount of variation in y that is explanied by the linear relationships with x

(r²) % of variation can be explained by the linear relationship with x

On the scatterplot; Outlier

Is any point that has a large residual after the line has been computed for the data

On a scatterplot; Influential point

Any point that if taken out would have a large effect on the slope or coorelation

Help

Options

focusNode

Didn't know it?
click below

Knew it?
click below

Don't Know

Remaining cards (0)

Know

retry

shuffle

restart

0:00

Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

AP Stats Exam review

Question	Answer
Categorical Data	Typical word description ex. Color, political party, gender
Quantitative data	Data that is numerical; has an average, can talk and the spread; can be graphed.
Bar Charts	For categorical data, use a y-axis; graph the totals or percentages
Pie Charts	For categorical data, breaks down one total into subgroups, adds to 100%, always percentages
Misleading Graphs	Used to exaggerate results by having the y-axis start at a convient point
Maginal Distributions	The numbers in the margins , of a two-way table, out of a total.
Segemented Bar Graph	A bar graph where various categories are spread amoung multiple groups
Simpsons Parodox	When subgroups can show one relationship yet overall data shows a reverse relationship. When subgroups are unbalanced
Stemplots	Like a histogram but still shows actual values. Can do a split stem plot which is two distributions
Dotplots	Works well for small data sets. Having dot of each point in relation to it on a below line.
Distrubution	A way of describing a varaible by what value it takes on and with what frequency
SOCS	S: Shape, of the graph (symetric, bimodal, skewed, uniform) O: Outliers, any unusual values C: Center, mean median S: Spread, five number summaries
Skewness	AKA strected either left or right
Histogram	Similar to bar graph but each value goes in the bar directly to the left of it.
Percentiles	THe percentage of data at or below a particular value
Effect of Shape on Center	Symetric: mean = median: skewed left: mean is less than median; Skewed Right: mean is greater than median
Five number Summary	Min, Q1, median, Q3, Max
Interquartile Range	IQR is Q3-Q1
Bosplots	A graph of the five number summary
Boxplot Outlier Rule	Anything above Q3+1.5(IQR) or below Q1-1.5(IQR)
Standard Deviation	Roughly estimates how far, on average, the data values are from the mean
Resistant Statistics	It´s resistant if the inclusion of an outlier (or strong skewness) will not or barely affect it
Variance	Is equal to (SD)²
Probability Distributions	Like a distribution but the frequencies that the values of that variable take on are expressed as proportions
Ogives	AKA Cumulative frequency graphs Shows how much of the data is at or below a particular data value
z values	The number of standard deviations a data point is from the mean AKA standardize scores
Linear Transformations	New data = A+B(original data) or y=a+bx. If you add a constant to "a" or to the data set only the measures of center and location change. If you multiply by a constant "b" then all summary measures change
Density Curves	A graph when the curve is completely above the x-axis and area=proportion so total area under the curve =100%
Discrete	If the values can be counted or listed completly
Continuous	If the values are uncountable or have an innumerable amount of possibilities; unlimited outcomes
The normal distribution	A symmetrical density curve, Measures the proportion of data per interval of standard deviation
The empirical rule	In a normal distributions is the percent of data incompassed by +- each SD from the mean: 68%, 95%. 99.7%
The standard normal distribution	Has a mean of sero and a SD of one
Scatterplot	Shows relationships between two quantitative variables by pairing two related values as coordinates and graphing them
Explanatory Variables	AKA input; helps predict the response variable, goes on the x-axis
Response Variable	AKA Output; the variable which is a result of the input or explanatory varible, goes on the y-axis
Graph variable one against variable two means	Variable one is on the y-axis and variable two is on the x-axis
DOSS	D: Direction, Upward? Downward? Postive or negative? O: Outliers, unusual points S: Shape, Linear? Curved? S: Strength, Weak? Strong? Modereate?
R-value	A unitless number that simply describes the strength and direction of the relationship between two quantitative variables.
Correlation vs. Causation	They may be related for confounding variables
Correlation	Is for two quantitative varibles
Association	For categorical variables
Interpret Slope	For each additional one unit increase (x), the predicted change in (y) will be (slope)
Interpret y-int	If x is zero then we predict that y will be y-int
Least Squares Regressions line	AKA LSRL. The one line that makes the sum of all these squares as small as possible
Residuals	The differnces between the LSRL and the actual y coordinates
Residual Plot	A scatterplot that pairs the residuals with the x-values of the data. Shows how large the residuals are (errors)
R-squared	The amount of variation in y that is explanied by the linear relationships with x
Interpret r-squared	(r²) % of variation can be explained by the linear relationship with x
On the scatterplot; Outlier	Is any point that has a large residual after the line has been computed for the data
On a scatterplot; Influential point	Any point that if taken out would have a large effect on the slope or coorelation
Statistics	Any value that describes or summarizes a sample
Parameter	Any value that describes or summarizes a population
Census	The data is the entire population
Sampling Frame	The part of the population from which the smaple was actually drawn
Bias	A sampling method is biased if we suspect the method used will produce estimates that are predictable compared to the popluation
Bias is not a sample ____ issue, but a sampling ____ issue	Size; method
Voluntary Response Bias	Which subjects aren't randomly chose, but rather subject choose if they will provide data.
Non response Bias	Has random selection but a meaningful proportion of the population wasn't sampled
Response Bias	When the data itself is highly suspect or inaccurate (ex. illegal activity)
Under coverage Bias	When the sampling method never gives a subgroup a chance to be in the sample
Convenience Sample	Using data that is simple or easy to gather
Simple Random Sample	SRS; Each item in the population has the same chance of being selected. All groups are equally possible
How to SRS	1) Assign each item in the population a number. Then mix them up then draw out the desired number of subjects OR 2) Use a random number table or use a calc.
Stratified Sample	Use if you think there might be a confounding variable associated with the variable of interest. Group subjects into "strata"in the proportion that they make up the population
Stratified samples have ___ variation from sample to sample than SRS	Less
Cluster Sampling	Break down the population into "clusters" of mixed subgroups, then randomly select which clusters to sample from.
Systematic Sample	Where you sample every kth item. After you randomly select the first number,
Observational Study	You compare two or more groups according to some explanatory variable (s) and measure a response variable to "observe"the differences.
You ____ deduce a cause and effect relationship between the variables in an observational study,	Cannot
Experiments	We control the application of the explanatory variables through random assignment of treatments of subjects.
We ___ conclude a cause and effect relationship between variables in an experiment.	Can
Retrospective Observational Study	Find subjects with desired response variables and look back to see how they differ from the explanatory variables
Introspective Observational Study	Find subjects with desired explanatory variables and follow into the future to see how they differ from the response variables
Good Experiments	Have more than one treatment, random assignments, control of other variables and replication
Experimental Units	Whatever are being randomly assigned to the treatments of the explanatory variables
Treatments	Specific levels or combos of all levels of factors in an experiment
Control Group	Establish a baseline; measure the placebo effect
Placebo	A fake treatment to see if the mere participation in the experiment produces change in behavior
A placebo is ____ than just a control group that gets nothing	Better
Blind	Subjects don't know which treatment they get
Double-blind	Researchers and subjects don't know who gets what treatment
Block Design	Subjects are grouped by known similarities
Matched Pairs	Paired with yourself or pairs of similar subjects
Completely Randomized	No grouping ahead of time; subjects are randomly assigned to treatment groups
Confounding varaibles	A variable associated with the explanatory variable that may help explain association or cause it
Cross-over matched pairs	Assigned to yourself
Statistically Significant	More than what would reasonably happen just due to chance or less than alpha
P(A and B) =	P(A) x P(B\|A)
P(B\|A) =	P(B) is A and B are independent
P(A or B) =	P(A) + P(B) - P(A and B)
P(A and B) =	0 is A and B are mutally exclusive
Independence	Two events are independent if the probability of the second event is unchanged regardless of whether the first event is happening
sample space	Listing all the possible outcomes, the sum of all probabilities of the event is one, all the probabilities are between 0 and 1
probability distribution	including the probability of each event in your sample space
Complements	Everything in the sample space besides that event
Conditional Probabilites	P(B) vs. P(B\|A) Basically a probability that the factors in extra information, or an additional "condtion"
Probability of A or B	P(A or B) = P(A) + P(B) - P(A and B), also written as P(A U B)
Discrete Variables	Can take on a finite number of outcomes; they can be listed
Continuous Variables	Can take on an infinite number of outcomes, best described by intervals rather than specific outcomes
You can only combine variances for ______ ______	independent variables
Binomial Distributions conditions	1) exactly two outcomes of interest for each trail, 2) a fixed number of trials, 3) same probability of success for each trail, 4) each trial is independent of each other
Binomial PDF	Probability of exactly k successes out of n trials
Binomial CDF	Probability of k or less success out of n trails
10% condition	If the sample is less than 10% of the population then sampling without replacing is "close enough" to the same probabilty of success on each trial
Large Counts Condition	If np and n(1-p) are both greater than or equal to 10
Geometric distributions	Have a fixed probabilty of success, each trial is independent, exactly two outcomes of interest BUT we're interested in our first sucesss being on a certain trial
Sampling Distributions	Describes the set of all possible values a statistic can have for a given sample size. They are very predictable in the long run even though each sample statistic is unpredictable
Statistics are said to be _____ is in the long run they average out to equal the population parameter	unbiased
The _____ (SD) of a sample statistic is only dependent on the sample size, not the population size	variability
A sampling distribution is ____ ____ if np and n(1-p) is greater than or equal to 10	approx. normal
Central Limit Theorem	A sampling Distribution is approx. normal when n is greater than or equal to 30
Confidence Intervals	Estimate populations parameters (means, proportions) by giving an interval where we believe the parameter might lie and how confident we are about it
Significance tests	access the weight of the evidence against the population parameter being ture
We assume our hypotheses are ___ unless we find significant evidence to the contrary	true
Point estimate	Giving your sample statistic as best specific estimate of an unknown population parameter
General Formula for z-int	Statistic + - multiplier (SE)
Muilipier	Same as critical value
Interpret a confidence interval	We can be % confident that the actual (parameter of interest) lies between __ and ___ (units)
Conditions for z-int	1) random sample, 2) population greater than or equal to 10, 3) at least 10 successes and 10 failures
ACDC	A: Announce, What test? Ho? Ha? P? M? Alpha? C: Conditions, Known or assumptions? D: Do, Procedure? P=? df? z? t? C: Conclude, Interpret? Reject Null? Sig Evidence?
Conditions for t int	1) random sample, 2) population greater than or equal to 10
The normal Condition	Either the population is known to be normal, n is greater than or equal to 30 or we graph the data and there is no strong skewness or outliers
We use a t-int when there are __ random variables	2
Degree of freedom	DF, for one sample methods is df=n-1
Samples ____ prove any hypothesis to be true or false	cannot
p-value	The probabilty of getting your test statistic or more extreme, assuming the null is true
If the p-value is small	reject the null
Null Hypothesis	Ho, always has =
Alternate hypothesis	Ha, will involve an inequality sign
The ___ the p-value, the more significant the statistical evidence is against the null hypothesis	smaller
Type one error	The null is actually true but we reject it
Type two error	the alternate is true but we do not conclude that
power of a test	The probabilty of not making a type 2 error. So the alternate is true and we conclude that
one-proportion-z-test conditions	1) random sample, 2) sample is less than 10% of population, 3) np is greater than 10, n(1-p) is greater than 10
The effect of a two tailed test is	typically double the p-value
conditions for one-proportion t-test	1) random sample, 2) sample is less than 10% of population, 3) approx. normal or n is more than 30
Matched pairs t-test	Just like a t-test but on the differences between each pairs. Significant evidence that the mean difference is not zero
Conditions for 2-prop-z-int	1) each is a random sample, 2) each population is greater than 10%, 3) each sample has at least 10 success and 10 failures
Conditions for 2-prop-z-test	1) each random sample, 2) each population greater than 10 times n, 3) at least 10 success and 10 failures per sample
Pooled proportions	GIves the best estimate of the actual proportion
Degree of freedom for 2-sample t-int	is n1+n2 -2
Goodness of fit test	We take one sample and look at the distribution of a single categorical variable
Homogenous distribution test	We take samples from several populations and categorize data by a single variable
Test of Independence	We take one sample and then classify data by two categorical variables
GOF Hypothesis	Null is the proposed distribution is correct. Alt is the distribution is incorrect
Conditions for all x² tests	1) random sample rule, 2) 10% condition, 3) large colunts rule (greater than 5)
x² tests use ___ to show what should be true is the distributions are correct	expected values
Df for x²	Categories - 1
Hypothesis for x² test of homogenous distributions	Null is the distributions of your categorical variables are the same for each population. Alt is the distributions are not the same
X² test of independence hypothesis	Null is no association, Alt is association
Hypothesis test for the slope	Null, there is no linear relationship between x and y Alt, there is a linear relationship
A slope of zero signifies	no linear relationship
Linear relationship between quantitative variables Conditions	1) random sample, 2) 10% condition, 3) approx, normal residuals
Df for linear regression relationship	n-2
Linear regression relationship formula	stat +- multiplier (SE)

Created by: Avery4

"Know" box contains:
Time elapsed:
Retries: