|Sample ||Subgroup of the population|
|Sampling ||Process of selecting sample from population|
|Random sampling ||Independent selection|
|Descriptive vs. Inferential Statistics ||– Descriptive: primary purpose is to describe some aspect of the data
Inferential: primary purpose is to infer (to estimate or to make a decision, test a hypothesis)|
|All inferential statistics have the following in common: ||– use of some descriptive statistic
– use of probability
– potential for estimation
– sampling variability
– sampling distributions
– use of a theoretical distribution
– two hypotheses, two decisions, two types of error|
|Research defined ||Structured Problem Solving|
|Scientific methods: steps (cyclic) ||– 1. encounter and identify problem
– 2. formulate hypotheses, define variables
– 3. think through consequences of hypotheses
– 4. design & run study, collect data, compute statistics, test hypotheses
– 5. draw conclusions|
|Variable ||entity that is free to take on different values|
|ndependent variable (IV) ||its values are manipulated by the researcher, comes first in time|
|Dependent variable (DV) ||measured by researcher, follows the IV in time|
|Population ||Target group for inference|
|Extraneous variable (EV) ||controlled by researcher
• randomization of subjects to groups
• keep all subjects constant on EV
• include EV in the design of the experiment|
|Predictor variable (PV) ||comes first in time but there is no manipulation, analogous to IV.|
|Criterion variable (CV): ||follows PV in time, analogous to DV.|
|Causal relationship: ||IV causes the DV|
|Predictive relationship: ||PV predicts the CV|
|2 Types of research ||1. experimental 2. observational|
|True experiment ||• manipulation of IV
• randomization of subjects to groups
• causal relationship between IV and DV|
|Observational research ||• no manipulation
• minimal control of EV
• predictive relationship between PV and CV
|Stem and Leaf Display ||• The first digit(s) of a score form the stem, the last digit(s) form the leaf.
• We want 10-20 total number of stems.
• Number of stems per digit depends on total number of stems: can do 1, 2, or 5 stems per digit.
|Description With Statistics
Aspects or characteristics of data that we can describe are: ||– Middle
|Other words that describe Middle ||central tendency, location, center|
|Statistics that Measure middle are: ||mean, median, mode
• “Middle” is the aspect of data
we want to describe.
• We describe/measure the middle of data in a population with the parameter m (‘mu’); we usually don’t know m, so we estimate it with X-bar.|
|Other words that describe Spread ||variability, dispersion, skatter|
|Statistics that Measure spread are: ||range, variance, standard deviation, midrange
• “Spread” is the aspect of data we want to describe.
• Any statistic that describes/measures spread should have these characteristics: it should
– Equal zero when the spread is zero.
|Skewness ||=departure from symmetry
– Positive skewness = tail (extreme scores) in positive direction
– Negative skewness = tail (extreme scores) in negative direction
(The Few name the Skew)|
|Kurtosis ||peakedness relative to normal curve|
|Sample Mean ||-The sample mean is the sum of the scores divided by the number of scores, and is symbolized by X-bar, X = SX/N
-For example, for X1=4, X2=1, X3=7, N=3, SX=12 and X = SX/N = 12/3 = 4
– X-bar is the balance point|
|Sample Median ||• The median is the middle of the ordered scores, and is symbolized as X50.
• Median position (as distinct from the median itself) is (N+1)/2 and is used to find the median.
• Example: X1=4, X2=1, X3=7, then N=3.
|Sample Mode ||• The mode is the most frequent score.
– 1 1 4 7, the mode is 1.
– 1 1 4 7 7, there are two modes, 1 and 7.
– 1 4 7, there is no mode.
– Has problems: more than one, or none; maybe not in the mid|
|Spred cont. ||• We describe/measure the spread of data in a sample with the statistics:
– Range = high score-low score.
– Midrange, MR.
– Sample variance, s*².
– Sample standard deviation, s*.
– Unbiased variance estimate, s².
• We des|
|Midrange (MR) ||• Formula is MR=UH-LH
– UH=upper hinge
– LH=lower hinge
– Hinges cut off 25% of the data in each tail
• Hinge position is ([median position]+1)/2.
– [median position] is the whole number part of the median position (remember, median p|
|Hinge position ||([median position]+1)/2
– [median position] is the whole number part of the median position (remember, median pos.=(N+1)/2)
• Use hinge position to count in from the tails to find the hinges.
Sample Standard Deviation, s*Sample Variance, s*² ||• Definitional formula: s*²=S(X-X)²/N, the average squared deviation from X-bar.
Sample Standard Deviation= s*
Unbiased Variance Estimate, s²|
|Box-plots ||• A pictorial description that uses a box to show the middle of the data and lines called whiskers to show the tails of a distribution.
|3 Parts to Box Plot ||1.) Box
|Box ||– Upper end is at the UH, lower end is at the LH - Line across the middle is X50|
|Whiskers ||– Whiskers are lines drawn from the ends of the box (the hinges) to adjacent values, UAV & LAV.
– Adjacent values are the first real data values inside the inner fences.
– Inner fences, upper and lower
• Upper, UIF=UH+1.5MR
• Lower, LIF= L|
|Outliers ||Outliers: outside whiskers, marked with|
|Midrange (MR) ||UH- LH|
|z Scores ||• The aspect of the data we want to describe/measure is relative position. • z scores are statistics that describe the relative position of something in its distribution.|
|Z score formula ||z is something minus its mean divided by its standard deviation.|
|z score characteristics ||– The mean of a distribution of z scores is zero.
– The variance of a distribution of z scores is one.
– The shape of a distribution of z scores is reflective, the shape is the same as the shape of the distribution of the Xs.|
|Characteristics of Normal Distributions ||– Symmetric, continuous, unimodal.
– Scores range from -¥ to +¥ .
– Mean, median, and mode are all the same value.
– Each distribution has two parameters, m and s².|
|Use of Z score ||• We use this distribution to get probabilities associated with a z score (probability, proportion, and area under the curve are synonymous).
- look up z in table to find probabilities.|
|Correlation ||– Defined as the degree of linear relationship between X and Y. – Is measured/described by the statistic r.|
|Regression ||– Is concerned with the prediction of Y from X Forms a prediction equation to predict Y from X
Uses the formula for a straight line, Y’=bX+a.
– Y’ is the predicted Y score on the criterion variable.
– b is the slope, b=DY/ D X=rise/run.
|r= || r=SzXzY/N, the average product of z scores for X and Y
– Works with two variables, X and Y
– -1<r<1, r measures positive or negative relationships
– Measures only the degree of linear relationship
– r2=proportion of variability in Y that is e|
|r2= ||proportion of variability in Y that is explained by X.|
|Correlation: Undefined ||If there is no spread in X or Y, then r is undefined. Note that any z is undefined if the standard deviation is zero, and r=SzXzY/N.|
|Population correlation coefficient, ||r (rho)|
|regression cont. ||• Linear only.
• Generalize only for X values in
• Actual observed Y is different from Y’ by an amount called error, e, that is, Y=Y’+e.
• Error in regression is e=Y-Y’.
• Many different potential regression|
|Line of Best Fit ||The statistics b and a are computed so as to minimize the sum of squared errors, – Se2=S(Y-Y’)2 is a minimum. – This is called the Least Squares Criterion.|
|Partition total spread ||– Total = Explained + Not Explained
– This is true for proportion of spread and amount of spread.
• Proportion: 1 = r2 + (1-r2)
• Amount: s2y = s2y r2 + s2y(1-r2)|
|Probability ||Defined as relative frequency of occurence.|
|Sample space ||all possible outcomes of an experiment|
|Elementary event ||a single member of the sample space|
|Event ||any collection of elementary events|
|p(elementary event ||1/(total number)|
|p(event) ||(number in the event)/(total number)|
|Conditional probability ||• p(A|B)=(number in [A and B])/(number in B)
• The probability of A in the redefined (reduced) sample space of B.|
|Big 3 Probability Rules ||1. independence 2. mulitplication, mutually exclusive 3.) addition|
|Independence (1) || events A and B are independent if
• The A probability is not changed by
reducing the sample space to B.|
|Multiplication (And) Rule (2) ||• p(A and B)=p(A)p(B|A)=p(A|B)p(B)|
|Mutually exclusive: ||• Events A and B do not have any elementary events in common.
• Events A and B cannot occur simultaneously.
• p(A and B)=0|
|Addition (Or) Rule (3) ||p(A or B)=p(A)+p(B)-p(A and B)|
|The sampling distribution of X-bar ||– Has the purpose of any sampling distribution: to obtain probabilities…
– Has the definition of any sampling distribution: the distribution of a statistic.
– Has specific characteristics:
• Mean: mX = m
• Variance: s2X =s2/N
• Shape i|
|Hypothesis testing ||is the process of testing tentative guesses about relationships between variables in populations. These relationships between variables are evidenced in a statement , a hypothesis, about a population parameter.|
|Test statistic ||a statistic used only for the purpose of testing hypotheses; e.g. zX|
|Assumptions ||conditions placed on a test statistic necessary for its valid use in hypothesis testing;– for zX, the assumptions are that the population is normal in shape and that the observations are independent.|
|Null hypothesis ||the hypothesis that we test; Ho.|
|Alternative hypothesis ||where we put what we believe; H|
|Significance level ||he standard for what we mean by a “small” probability in hypothesis testing; a.
The significance level is the small probability used in hypothesis testing to determine an unusual event that leads you to reject Ho.
– The significance level is sym|
|Direcetional v. Non-Directional Hypothesis ||>,<, or =
• Directional hypotheses specify a particular direction for values of the parameter.
– IQ of deaf children example: Ho: m>100, H1: m<100.
• Non-directional hypotheses do not specify a particular direction for values of the paramet|
|One- and two-tailed tests ||– A one-tailed test is a statistical test that uses only one tail of the sampling distribution of the test statistic.
– A two-tailed test is a statistical test that uses two tails of the sampling distribution of the test statistic.|
|Critical values ||values of the test statistic that cut off a or a/2 in the tail(s) of the theoretical reference distribution.|
|Rejection values ||the values of the test statistic that lead to rejection of Ho|
|p-Value Decision Rules ||• Reject Ho if
– ½ the SAS p-value <a, and
– the observed zX is in the tail specified by H1.