click below
click below
Normal Size Small Size show me how
Stats & Probability
Vocabulary related to statistics and probability
Term | Definition |
---|---|
content validity | when the test measures what it is supposed to measure Ex: An algebra test has content validity if it measures how well students actually understand algebra |
criterion validity | when the test is a good predictor of success on a related test Ex: If students who do well on the ACT also do well in college, then the ACT has criterion validity |
test-retest | A test for reliability. Give the test to a group of people, then give the same test again to the same group after a period of time. Their results should be similar for the test to be reliable. |
parallel forms | A test for reliability. Create 2 different tests that are equally valid. Give both tests to the same group of people. Each person's results on one test should match their results on the other test for them to be reliable. |
convenience sampling | A NON-RANDOM sampling method. The sample is chosen based on who is convenient. Example: surveying everyone in a class, or posting a survey on social media and whoever answers is who you choose |
quota sampling | A NON-RANDOM sampling method. Split the population into categories (by gender, ethnicity, grade, etc.). Then, survey a proportionate amount of people from each category. Like convenience sampling, this is NOT random sampling. |
random sampling | A RANDOM sampling method. Choose people to sample from the population randomly. Example: Pull names from a hat, or roll a die. |
stratified sampling | A RANDOM sampling method. Split your population into groups, just like quota sampling. Within each group, take a RANDOM sample. Example: Split students into freshmen/sophomore/etc, then pull names from a hat. |
systematic sampling | A RANDOM sampling method. Create a list of each member in the population. Choose every kth member of the population to sample. Example: Give the survey to every 50th student on the roster for North in alphabetical order. |
variance | standard deviation squared |
outliers | Data points that are 1.5*IQR above the Q3 Q3 + 1.5*IQR OR 1.5*IQR below the Q1 Q1 - 1.5*IQR |
interpolate | Make a prediction INSIDE the range of the given data. This is a valid prediction. Example: If you have data on students from ages 6 - 12, make a prediction about students who are 8.5 years old. |
extrapolate | Make a prediction OUTSIDE the range of the given data. This is NOT a valid prediction. Example: If you have data on students aged 6-12, use that data to make a prediction about a 14 year old student. |
mutually exclusive | Mutually exclusive events CANNOT both occur. Ex: You cannot be in this class as a freshman. So being a freshman and taking HL IB Math are mutually exclusive events. |
independent | If two events are independent, one event happening doesn't change the probability of the other occurring. Ex: Roll a die and flip a coin. The result on the die is independent of the result on the coin. |
P(A U B) | "Probability of either A or B occurring." The union of A and B. Represented by everything within both circles on a Venn Diagram. |
P(A ∩ B) | "Probability of both A and B occurring." The intersection of A and B. On a Venn Diagram, the part in the middle. |
P(A | B) | "Probability of A given B." If we know that event B has occurred, what is the probability that event A also occurred? |
Discrete | has values that can be counted |
Continuous | Has values that can be measured Example: Height, length, weight |
discrete | Has values that can be COUNTED Example: number of people, amount of money, etc. |
continuous | Has values that can be MEASURED Example: height, length, hours of sleep |
modal class | Mode for continuous data (the most frequent interval) |
discrete | has values that can be counted e.g. number of people, amount of money |
continuous | has values that can be measured e.g. height, length, time |
modal class/grade | mode for continuous data (most frequent interval) |
interquartile range (IQR) | IQR = Q3 - Q1 |
quartiles | Q1 is the 25th percentile; 25% of the data is smaller than it Q2 is the 50th percentile (aka the median); 50% of the data is smaller than it Q3 is the 75th percentile; 75% of the data is smaller than it |
discrete random variable | A variable that can take on a countable number of values from a random experiment. Ex: S is a discrete random variable for the sum of rolling two dice together. |
probability distribution function | Describes the probabilities that a random variable will take on each possible value. Usually written in a table with the possible values on top and the corresponding probability below. |
expected value | The mean of a random variable. The average value you would expect after performing several trials. Can be found by multiplying each value of a random variable by its corresponding probability, then adding all of those multiplied values together. |
fair game | A game is when the random variable represents a prize or gain (such as winning money). It is a fair game if the expected value is 0. This usually means the cost of playing the game is equal to the expected value. |
Type I error | Rejecting the null hypothesis when the null hypothesis is actually correct. Example: Convicting someone as guilty of a crime, when really they are innocent. |
Type II error | Not rejecting the null hypothesis when the null hypothesis is actually incorrect. Example: Declaring someone "not guilty" when they actually did commit the crime. |
p-value | The probability of committing a Type I error (saying your claim/alternative hypothesis is true when it's actually false and the null hypothesis is correct) |
Null hypothesis | H0 The default claim we assume to be true unless we can prove otherwise. Example: eating jelly beans is independent of getting cancer |
Alternative hypothesis | H1 The claim we are trying to prove instead of the null hypothesis. It must be mutually exclusive with the null hypothesis. Example: Eating jelly beans is correlated with getting cancer |