Front | Back |
Sample | Subgroup of the population |
Sampling | Process of selecting sample from population |
Random sampling | Independent selection |
Descriptive vs. Inferential Statistics | – Descriptive: primary purpose is to describe some aspect of the data
Inferential: primary purpose is to infer (to estimate or to make a decision, test a hypothesis) |
All inferential statistics have the following in common: | – use of some descriptive statistic
– use of probability
– potential for estimation
– sampling variability
– sampling distributions
– use of a theoretical distribution
– two hypotheses, two decisions, two types of error |
Research defined | Structured Problem Solving |
Scientific methods: steps (cyclic) | – 1. encounter and identify problem
– 2. formulate hypotheses, define variables
– 3. think through consequences of hypotheses
– 4. design & run study, collect data, compute statistics, test hypotheses
– 5. draw conclusions |
Variable | entity that is free to take on different values |
ndependent variable (IV) | its values are manipulated by the researcher, comes first in time |
Dependent variable (DV) | measured by researcher, follows the IV in time |
Population | Target group for inference |
Extraneous variable (EV) | controlled by researcher
• randomization of subjects to groups
• keep all subjects constant on EV
• include EV in the design of the experiment |
Predictor variable (PV) | comes first in time but there is no manipulation, analogous to IV. |
Criterion variable (CV): | follows PV in time, analogous to DV. |
Causal relationship: | IV causes the DV |
Predictive relationship: | PV predicts the CV |
2 Types of research | 1. experimental 2. observational |
True experiment | • manipulation of IV
• randomization of subjects to groups
• causal relationship between IV and DV |
Observational research | • no manipulation
• minimal control of EV
• predictive relationship between PV and CV |
Stem and Leaf Display | • The first digit(s) of a score form the stem, the last digit(s) form the leaf.
• We want 10-20 total number of stems.
• Number of stems per digit depends on total number of stems: can do 1, 2, or 5 stems per digit. |
Description With Statistics
Aspects or characteristics of data that we can describe are: | – Middle
– Spread
– Skewness
– Kurtosis |
Other words that describe Middle | central tendency, location, center |
Statistics that Measure middle are: | mean, median, mode
• “Middle” is the aspect of data
we want to describe.
• We describe/measure the middle of data in a population with the parameter m (‘mu’); we usually don’t know m, so we estimate it with X-bar. |
Other words that describe Spread | variability, dispersion, skatter |
Statistics that Measure spread are: | range, variance, standard deviation, midrange
• “Spread” is the aspect of data we want to describe.
• Any statistic that describes/measures spread should have these characteristics: it should
– Equal zero when the spread is zero.
– Inc |
Skewness | =departure from symmetry
– Positive skewness = tail (extreme scores) in positive direction
– Negative skewness = tail (extreme scores) in negative direction
(The Few name the Skew) |
Kurtosis | peakedness relative to normal curve |
Sample Mean | -The sample mean is the sum of the scores divided by the number of scores, and is symbolized by X-bar, X = SX/N
-For example, for X1=4, X2=1, X3=7, N=3, SX=12 and X = SX/N = 12/3 = 4
• Characteristics:
– X-bar is the balance point |
Sample Median | • The median is the middle of the ordered scores, and is symbolized as X50.
• Median position (as distinct from the median itself) is (N+1)/2 and is used to find the median.
• Example: X1=4, X2=1, X3=7, then N=3.
• Characteristic |
Sample Mode | • The mode is the most frequent score.
• Examples:
– 1 1 4 7, the mode is 1.
– 1 1 4 7 7, there are two modes, 1 and 7.
– 1 4 7, there is no mode.
• Characteristics:
– Has problems: more than one, or none; maybe not in the mid |
Spred cont. | • We describe/measure the spread of data in a sample with the statistics:
– Range = high score-low score.
– Midrange, MR.
– Sample variance, s*².
– Sample standard deviation, s*.
– Unbiased variance estimate, s².
– s.
• We des |
Midrange (MR) | • Formula is MR=UH-LH
– UH=upper hinge
– LH=lower hinge
– Hinges cut off 25% of the data in each tail
• Hinge position is ([median position]+1)/2.
– [median position] is the whole number part of the median position (remember, median p |
Hinge position | ([median position]+1)/2
– [median position] is the whole number part of the median position (remember, median pos.=(N+1)/2)
• Use hinge position to count in from the tails to find the hinges. |
Sample Standard Deviation, s*Sample Variance, s*² | • Definitional formula: s*²=S(X-X)²/N, the average squared deviation from X-bar.
Sample Standard Deviation= s*
Unbiased Variance Estimate, s² |
Box-plots | • A pictorial description that uses a box to show the middle of the data and lines called whiskers to show the tails of a distribution. |
3 Parts to Box Plot | 1.) Box
2.) Wiskers
3.) Outliers |
Box | – Upper end is at the UH, lower end is at the LH - Line across the middle is X50 |
Whiskers | – Whiskers are lines drawn from the ends of the box (the hinges) to adjacent values, UAV & LAV.
– Adjacent values are the first real data values inside the inner fences.
– Inner fences, upper and lower
• Upper, UIF=UH+1.5MR
• Lower, LIF= L |
Outliers | Outliers: outside whiskers, marked with |
Midrange (MR) | UH- LH |
z Scores | • The aspect of the data we want to describe/measure is relative position. • z scores are statistics that describe the relative position of something in its distribution. |
Z score formula | z is something minus its mean divided by its standard deviation. |
z score characteristics | – The mean of a distribution of z scores is zero.
– The variance of a distribution of z scores is one.
– The shape of a distribution of z scores is reflective, the shape is the same as the shape of the distribution of the Xs. |
Characteristics of Normal Distributions | – Symmetric, continuous, unimodal.
– Bell-shaped.
– Scores range from -¥ to +¥ .
– Mean, median, and mode are all the same value.
– Each distribution has two parameters, m and s². |
Use of Z score | • We use this distribution to get probabilities associated with a z score (probability, proportion, and area under the curve are synonymous).
- look up z in table to find probabilities. |
Correlation | – Defined as the degree of linear relationship between X and Y. – Is measured/described by the statistic r. |
Regression | – Is concerned with the prediction of Y from X Forms a prediction equation to predict Y from X
Uses the formula for a straight line, Y’=bX+a.
– Y’ is the predicted Y score on the criterion variable.
– b is the slope, b=DY/ D X=rise/run.
– |
r= | r=SzXzY/N, the average product of z scores for X and Y
– Works with two variables, X and Y
– -1<r<1, r measures positive or negative relationships
– Measures only the degree of linear relationship
– r2=proportion of variability in Y that is e |
r2= | proportion of variability in Y that is explained by X. |
Correlation: Undefined | If there is no spread in X or Y, then r is undefined. Note that any z is undefined if the standard deviation is zero, and r=SzXzY/N. |
Population correlation coefficient, | r (rho) |
regression cont. | • Linear only.
• Generalize only for X values in
your sample.
• Actual observed Y is different from Y’ by an amount called error, e, that is, Y=Y’+e.
• Error in regression is e=Y-Y’.
• Many different potential regression |
Line of Best Fit | The statistics b and a are computed so as to minimize the sum of squared errors, – Se2=S(Y-Y’)2 is a minimum. – This is called the Least Squares Criterion. |
Partition total spread | – Total = Explained + Not Explained
– This is true for proportion of spread and amount of spread.
• Proportion: 1 = r2 + (1-r2)
• Amount: s2y = s2y r2 + s2y(1-r2) |
Probability | Defined as relative frequency of occurence. |
Sample space | all possible outcomes of an experiment |
Elementary event | a single member of the sample space |
Event | any collection of elementary events |
p(elementary event | 1/(total number) |
p(event) | (number in the event)/(total number) |
Conditional probability | • p(A|B)=(number in [A and B])/(number in B)
• The probability of A in the redefined (reduced) sample space of B. |
Big 3 Probability Rules | 1. independence 2. mulitplication, mutually exclusive 3.) addition |
Independence (1) | events A and B are independent if
• p(A|B)=p(A)
• The A probability is not changed by
reducing the sample space to B. |
Multiplication (And) Rule (2) | • p(A and B)=p(A)p(B|A)=p(A|B)p(B) |
Mutually exclusive: | • Events A and B do not have any elementary events in common.
• Events A and B cannot occur simultaneously.
• p(A and B)=0 |
Addition (Or) Rule (3) | p(A or B)=p(A)+p(B)-p(A and B) |
The sampling distribution of X-bar | – Has the purpose of any sampling distribution: to obtain probabilities…
– Has the definition of any sampling distribution: the distribution of a statistic.
– Has specific characteristics:
• Mean: mX = m
• Variance: s2X =s2/N
• Shape i |
Hypothesis testing | is the process of testing tentative guesses about relationships between variables in populations. These relationships between variables are evidenced in a statement , a hypothesis, about a population parameter. |
Test statistic | a statistic used only for the purpose of testing hypotheses; e.g. zX |
Assumptions | conditions placed on a test statistic necessary for its valid use in hypothesis testing;– for zX, the assumptions are that the population is normal in shape and that the observations are independent. |
Null hypothesis | the hypothesis that we test; Ho. |
Alternative hypothesis | where we put what we believe; H |
Significance level | he standard for what we mean by a “small” probability in hypothesis testing; a.
The significance level is the small probability used in hypothesis testing to determine an unusual event that leads you to reject Ho.
– The significance level is sym |
Direcetional v. Non-Directional Hypothesis | >,<, or =
• Directional hypotheses specify a particular direction for values of the parameter.
– IQ of deaf children example: Ho: m>100, H1: m<100.
• Non-directional hypotheses do not specify a particular direction for values of the paramet |
One- and two-tailed tests | – A one-tailed test is a statistical test that uses only one tail of the sampling distribution of the test statistic.
– A two-tailed test is a statistical test that uses two tails of the sampling distribution of the test statistic. |
Critical values | values of the test statistic that cut off a or a/2 in the tail(s) of the theoretical reference distribution. |
Rejection values | the values of the test statistic that lead to rejection of Ho |
p-Value Decision Rules | • Reject Ho if
– ½ the SAS p-value <a, and
– the observed zX is in the tail specified by H1. |