1) Social Science is not a hard Science 2) Humans are too complex for quantification 3) Historical, anecdotal, journalistic approach

Behavioralism (aka Basic Research)

1) There are regularities to permit generlizations 2) Explicit, Replicable, neutral methods 3) Priority: hypothesis testing to build theories Goal: highly predictive interlocking theories

Applied Research (Post-Behaviorialism or Policy Analysis)

Accepted the merits of explicit, rigorous, replicable scientific methods Changed the goal from building theory to addressing practical/applied/policy questions And acknowledged the role of values in setting research priorities

Classic Model of the Scientific Process

1) Theory 2) Deduce Hypothesis from theory 3) Design Study and operationalize concepts 4) Conduct the Study (collect the data) 5) Analyze data to accept/reject hypothesis 6) Support, modify, or reject initial theory

Model of Applied Research

Begin with specific, practical issue Devise Testable research question - Design study and operationalize concepts - Conduct the study (collect the data) - Analyze data to accept/reject hypothesis Use results to inform decision-maing

A testable statement of the relationship between two or more variables

A set of logically related propositions intended to explain a range of phenomena

Main Structure of Research Reports

Intro (Problem Area; Issues) Literature Review Methodology Findings Discussion and Conclusion

The Strong Lit Review

Primary (not secondary) sources Nonelectronic searches Contact leading researchers Add unpublished/forthcoming research Diagram/model key relationships Use elements of meta-analysis

(1) Clear Statement of Hypothesis (2) Explicit and Replicable Lit Searches (3) Set Variables for Coding Studies (4) Analyze predictors of the results - Certain factors associated with certain outcomes?

Good Individual Questions

Short as possible Shared, simple vocab Unbiased Language/premises Unambiguous Answers Confined to one issue Exhaustive/Exclusive Categories

Good Format and Overall Flow

Brief Smooth Intro Easy Non-threatening start Early closed-ended questions Move from general to specific Delay sensitive issues until later Demographics last Fair Framing Short transitions Consistent series answer format

Use Census if feasible, affordable and not often; but samples usually more practical

Random vs Nonprobability

Use random samples unless desperate

Random sampling includes

Simple (every nth) Stratified (proportionate or non proportionate)

Simple Random Sampling

Each sample chosen independently and randomly from the sampling frame

Selecting every nth item from a list (from a random point)

Draw random samples within groups if easier or to over sample a group intentionally. Proportionate or Disproportionate

Response Rate Determinants

Costs - Est. Lengths / Time / Complexity Benefits - Enjoyable / Important/ Satisfaction

Evaluating a Sample Size

Overall precision (CI) needed Depth of Subgroup analysis As well as the research budget

Categories by names only (region, religion, sex)

Categories can be ordered on a single dimension (agree/disagree; highest degree earned; young, middle, old)

Increments are consistent but no absolute zero (Fahrenheit, year of birth)

Absolute Quantities (amount of dollars, inches, siblings, years, pounds) ask yourself...can it be TWICE AS MUCH?

Principles of Data Analysis

(1) Good Data are a prerequisite (2) All Statistics are reductionist (3) Context dictates interpretation (4) Avoid Exaggerating small gaps (Bill hates this!) (5) Correlation DOES NOT equal Causation (6) Start with Univariate Analysis

Univariate Nominal Variables

Mode = Plurality but not always a majority Percentages = usually round %

Univariate Nominal Variables - Interpretation Pitfalls

Misleading Pictograms Confusing absolute and relative % Misinterpreting nominal nodes as if they were midpoint/averages Misleading/simplified composites from nominal and other modes

Measures of Central Tendency

(1) Mean (2) Median (3) Trimmed Means

Sum divided by # of cases; very sensitive to extreme values. x with line on top is sample mean; mu which looks like a u is for population mean

50th Percentile; half of the cases below; half above; totally insensitive higher and lower values

Discard a percent of the highest/lowest values, top and bottom five percent...used in Olympic scoring

Measures of Dispersion

(1) Range (2) Standard Deviation (3) Interquartile Range

Highest to lowest value; crude measure of dispersion

Standard Deviation (Equation)

Square root of the sum of the squared difference of each case from the mean divided by the number of cases

Shows the range of the middle 68% of cases in a normal curve, otherwise it only tells relative dispersion

25th to 75th percentiles; range of the middle 50% of all cases; easy to explain.

Asymmetrical distribution skewed positively if a few high scores pull the mean above median; reverse (mean below the median) reflects a negative skew.

The Bell Shaped Curve Central Limit Theorem

Descriptive Statistics

Data of the whole relevant population - treat results as real.

Inferential Statistics

Used with sample because results are estimates. Keeps us from jumping to conclusions and treating sample estimate as more precise than they really are.

Help

Options

focusNode

Didn't know it?
click below

Knew it?
click below

Don't Know

Remaining cards (0)

Know

retry

shuffle

restart

0:00

Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

Statistics PPPA 6002

2nd Quiz Study Guide

Question	Answer
Traditionalism	1) Social Science is not a hard Science 2) Humans are too complex for quantification 3) Historical, anecdotal, journalistic approach
Behavioralism (aka Basic Research)	1) There are regularities to permit generlizations 2) Explicit, Replicable, neutral methods 3) Priority: hypothesis testing to build theories Goal: highly predictive interlocking theories
Applied Research (Post-Behaviorialism or Policy Analysis)	Accepted the merits of explicit, rigorous, replicable scientific methods Changed the goal from building theory to addressing practical/applied/policy questions And acknowledged the role of values in setting research priorities
Classic Model of the Scientific Process	1) Theory 2) Deduce Hypothesis from theory 3) Design Study and operationalize concepts 4) Conduct the Study (collect the data) 5) Analyze data to accept/reject hypothesis 6) Support, modify, or reject initial theory
Model of Applied Research	Begin with specific, practical issue Devise Testable research question - Design study and operationalize concepts - Conduct the study (collect the data) - Analyze data to accept/reject hypothesis Use results to inform decision-maing
Hypothesis	A testable statement of the relationship between two or more variables
Theory	A set of logically related propositions intended to explain a range of phenomena
Main Structure of Research Reports	Intro (Problem Area; Issues) Literature Review Methodology Findings Discussion and Conclusion
The Strong Lit Review	Primary (not secondary) sources Nonelectronic searches Contact leading researchers Add unpublished/forthcoming research Diagram/model key relationships Use elements of meta-analysis
Meta-Analysis Steps	(1) Clear Statement of Hypothesis (2) Explicit and Replicable Lit Searches (3) Set Variables for Coding Studies (4) Analyze predictors of the results - Certain factors associated with certain outcomes?
Good Individual Questions	Short as possible Shared, simple vocab Unbiased Language/premises Unambiguous Answers Confined to one issue Exhaustive/Exclusive Categories
Good Format and Overall Flow	Brief Smooth Intro Easy Non-threatening start Early closed-ended questions Move from general to specific Delay sensitive issues until later Demographics last Fair Framing Short transitions Consistent series answer format
Census vs Sample	Use Census if feasible, affordable and not often; but samples usually more practical
Random vs Nonprobability	Use random samples unless desperate
Nonprobability Sampling	Convenience Purposive Snowball
Random sampling includes	Simple (every nth) Stratified (proportionate or non proportionate)
Simple Random Sampling	Each sample chosen independently and randomly from the sampling frame
Systematic	Selecting every nth item from a list (from a random point)
Stratified	Draw random samples within groups if easier or to over sample a group intentionally. Proportionate or Disproportionate
Response Rate Determinants	Costs - Est. Lengths / Time / Complexity Benefits - Enjoyable / Important/ Satisfaction
Evaluating a Sample Size	Overall precision (CI) needed Depth of Subgroup analysis As well as the research budget
95% Confidence Interval - Sample 100	+/- 10%
95% Confidence Interval - Sample 600	+/- 4%
95% Confidence Interval - Sample 1100	+/- 3%
Nominal	Categories by names only (region, religion, sex)
Ordinal	Categories can be ordered on a single dimension (agree/disagree; highest degree earned; young, middle, old)
Interval	Increments are consistent but no absolute zero (Fahrenheit, year of birth)
Ration	Absolute Quantities (amount of dollars, inches, siblings, years, pounds) ask yourself...can it be TWICE AS MUCH?
Principles of Data Analysis	(1) Good Data are a prerequisite (2) All Statistics are reductionist (3) Context dictates interpretation (4) Avoid Exaggerating small gaps (Bill hates this!) (5) Correlation DOES NOT equal Causation (6) Start with Univariate Analysis
Univariate Nominal Variables	Mode = Plurality but not always a majority Percentages = usually round %
Univariate Nominal Variables - Interpretation Pitfalls	Misleading Pictograms Confusing absolute and relative % Misinterpreting nominal nodes as if they were midpoint/averages Misleading/simplified composites from nominal and other modes
n	Univariate Sample size
N	Univariate population size
Measures of Central Tendency	(1) Mean (2) Median (3) Trimmed Means
Mean	Sum divided by # of cases; very sensitive to extreme values. x with line on top is sample mean; mu which looks like a u is for population mean
Median	50th Percentile; half of the cases below; half above; totally insensitive higher and lower values
Trimmed Means	Discard a percent of the highest/lowest values, top and bottom five percent...used in Olympic scoring
Measures of Dispersion	(1) Range (2) Standard Deviation (3) Interquartile Range
Range	Highest to lowest value; crude measure of dispersion
Standard Deviation (Equation)	Square root of the sum of the squared difference of each case from the mean divided by the number of cases
Standard Deviation	Shows the range of the middle 68% of cases in a normal curve, otherwise it only tells relative dispersion
IQR	25th to 75th percentiles; range of the middle 50% of all cases; easy to explain.
Smaller IQR/SD Scores	Tight cluster of cases
Measure of Shape	Skewness
Skewness	Asymmetrical distribution skewed positively if a few high scores pull the mean above median; reverse (mean below the median) reflects a negative skew.
The Normal Curve	The Bell Shaped Curve Central Limit Theorem
+/- 1 Std Dev	68.3% of all cases
+/- 2 Std Dev	95.4% of all cases
+/- 3 Std Dev	99.7% of all cases
Descriptive Statistics	Data of the whole relevant population - treat results as real.
Inferential Statistics	Used with sample because results are estimates. Keeps us from jumping to conclusions and treating sample estimate as more precise than they really are.
Population based statistics are...	Descriptive Only
Sample based statistics are...	Inferential and descriptive
Formula for 95% CI around a proportion...	(Sqr Root of P multiplied by (1 minus P) divided by Sample Size) mulitplied by 1.96
Confidence Intervals for Means Formula	Std Dev of Sample divided by the sqr root of sample size, then multiplied by 1.96
When to use T-Test	Comparing means of two groups... (1) using sample data (derived from random sampling) (2) using experimental data (derived from random assignment)
T-Test Steps	(1) State the Null Hypothesis (2) State Research Hypothesis (3) State Decision Rule (Probability Level) (4) Assume Equal Variance - Unless F-Test is significant (5) Reject or fail to reject the null
Easiest Null for T-Tests	There is no difference in the mean (dependent variable) of (group 1) and (group 2)
T-Test Interpretation	(1) Prevents 'jumping to conclusions' when differences in two means may just be random variation (2) Statistical significance is not the same as substantive significance (3) Easy to get stat. sig. with large samples, hard with small samples
T-Test and Population studies without randomized data	No need for T-Test, because it is inferential.
Difference in steps between Chi Square and T-Test	T-Test adds the F-test step.
Similarities between Chi Square and T-Test	(1) Stat. Sig. does NOT necessarily mean it is important or consequential. (2) If NOT stat. sig. remember we never prove the null we just fail to reject the null. (3) A small sample may not be Stat Sig, but could be Stat Sig in a larger sample
Three Elements of Causal Inference	(1) X & Y covary (2) X precedes Y (3) Rule out the Z's
Post Hoc Fallacy	Fallacy of concluding that since change in Y followed X, it was caused by X.
Antecendent Variables	Before X (Z->X->Y)
Intervening Variables	Between X and Y (X->Z->Y)
Campbell and Stanley's Notation System	O = Observations (measures) of Y Left to Right = Chronological Order Each Row = One Group of Subjects
Single Group posttest only	X O
Single Group pretest-posttest (before and after design)	O X O
Static Group Design	O X O ----- O O
History	External event during period
Maturation	Subjects change over time
Practice	Familiarity with the measure
Instrumentation	A changed measure
Regression to the mean	If subjects are chosen due to extreme scores, they tend to regress to the mean on posttest
Selection	Groups different from start
Intragroup history	unique group event
Mortality	groups differ in attrition
What to do with Attrition...	(1) Omit pretest scores of lost subjects; (2) Omit all data of lost 'types' from all groups (3) Match by statistical weighting (4) Analyze by "intention to treat" (i.e. include dropouts)
Between Group Reactivity	(1) Spillover (My buddy is sick and I know if I give him a lime he will get better) (2) Compensatory rivalry (controls try harder) (3) Resentful demoralization (Controls try less...I never get picked so I will just suck)
Placebo Effect	Subject expectancy to get better and psychologically they do. (Reactivity)
Novelty Effect	X works because its new. Innovation effect. Short term effect.(reactivity)
Guinea Pig Effect	Subjects act differently because they feel that they are under surveillance. Evaluation Apprehension - I know I am under
Demand Effect	Think they know what authority wants of them. The real pills are handed out with more conviction, requires double blind effect to limit.
Social desirability	Reflexivity - Political Correctness, Societal pressures/inhibitions, I am supposed to act a certain way.
Hawthorne Effect	Electric Plant Light Dimming Example. Refers to reactivity in general.
Heisenberg Effect	Act of measuring something changes what you're measuring
Two Elements of a true experiment	(1) Random Assignment of subjects to groups (2) Random Assignment of Treatments to groups
Source of power of experiments	Comparability of the groups - the only real difference is one gets X, the other doesn't. Otherwise the two groups are identical.
Classic Experimental Design	R) O X O R) O O
Posttest Only Experiment	R) X O R) O
Factorial Design	R) O Xa Xb O R) O Xa O R) O Xb O R) O O
Complex X	Many ingredients in X
Multiple Ys	Studies often measure the impact of X on several Ys.
Compensatory Rivalry	Controls try harder
Resentful demoralization	Controls try less
Spillover effects/diffusion	Some X spills over to controls
Strategies to minimize reactivity	(1) deceit (2) obscure / mislead (3) use placebo (4) double blind (5) time (hope they forget the study)
Placebo	A dummy treatment given to the controls to 'hold constant' the impact of their expectations. Common in medical studies; not always possible.
Natural Experiment	Both subjects and X were randomly assigned without a researcher's intervention; term is also sometimes used less strictly to refer to a close natural approximation even if lacking in randomization
Big Four Categories of Validity	(1) Measurement Validity (2) Internal Validity (3) Statistical Conclusion Validity (4) External Validity
External Validity	Generalizability; the essential yet unavoidably subjective judgment about the extent to which it is reasonable to generalize/extrapolate the findings of one study to other places, subjects, times, etc.
How to strengthen external validity	(1) Test subjects representative of the subjects you want to generalize to (2) replications in varied settings (3) Consistent results in varied tests
Limitations of Experiments	(1) Unethical or illegal to withhold X (2) Unethical or illegal to risk trying X (3) Unaffordable to finance in field (4) Infeasible to enforce X vs no X (5) Impractical to field test outside a lab
Quasi-Experimental Designs	Commonly means any clever design lacking randomized control groups
Causal-Comparative Designs	Studies that seek to infer causality using comparison groups without randomly assigned subjects
Primary threat of Internal validity when no randomization	Selection
NEC	Nonequivelent Comparison Group Design
Nonequivalent Comparison Group Designs	O X O ----- O O
Retrospective match / Ex post facto design	Creating a comparison group later by finding and matching subjects similar to those who previously got exposed to X.
Time Series Designs	X may be short term or enduring. Top internal validity threat is history. Trend line makes it superior to O X O.
Simple Interrupted Time Series	O O O O O O X O O O O O O
Reiterative Time Series	O O X O O X O O X O O
Comparison Time Series	O O O O O X O O O O O --------------------- O O O O O O O O O O
Multiple Time Series	O O X O O O O O O --------------------- O O O O X O O O O --------------------- O O O O O O X O O ---------------------- O O O O O O O O
Panel	Repeated data tracking same people; valuable but expensive, can produce reactivity
Cross-sectional data	Time series with new random samples from same population. Shows net change but masks the rest.
Deceptive Time Series Charts	Using a truncated base plus narrow or wide axis.
Retrospective pretests	Proxy pretests - recollections used for pretest measure.
Danger of time series inferences from a single survey	Can not infer age = time. Bill used the Navy Officer surveys of high ranking and low ranking officers, infering that low ranking officers will think like high ranking officer when they get there.
Correlational Designs	Typically using a single survey to try to "statistically control" for alternative explanations, often using multiple regression. Issues with selection.
Aggregate Data	Units of analysis are groups, such as precincts, cities, states.
Ecological Fallacy	Drawing individual level inferences from aggregae-level correlations.
Check list of Empirical Studies	(1) Theory Building or Applied Research (2) Causal or Descriptive (3) Exact Hypothesis (4) Independent Variable(s) (5) Dependent Variable(s)
When Something is NOT Statistically Significant	Do not bring it up. Consider the dispersion between the groups.
T-Test Analysis	Analysis is black and white, it is or it isn't stat. sig. If you hit .05, you have a slight relationship. State just that, a slight relationship.
Grouping Ratios	Becomes Ordinal
Central Tendency	Mean, Median, Trimmed Mean
Extreme Lopsided Distribution does what to Confidence Intervals?	Becomes Smaller
At what level is .012 statistically significant?	It is Stat. Sig at .05, but NOT at .001 or .01.
True or False - Standard Deviation is a measure of Central Tendency?	False
What is the biggest threat to NEC design?	Selection
What is the biggest threat to Time Series Designs?	History
What does comparing results to go good existing records?	Concurrent Validity
What are two elements of dispersion?	IQR and SD
Two Types of Empirical Validity	(1) Concurrent Validity (2) Predictive Validity
Concurrent Validity	Testing a measure against existing data believed accurate. (Empirical)
Predictive Validity	Testing a measure designed to predict future outcomes by the actual success of its forecasts. (Empirical)
Subjective Validity	(1) Face Validity (2) Content Validity
Face Validity	Operationalizing the usual usage of a word in a reasonable way.
Content Validity	Operationalizing the full scope of the entire intended concept and not just a part of it.
Multiple Measures (Triangulation)	Assessment using a variety of indicators (not just one)
Unobtrusive Measures	No survey - Measuring actual behavior - not just self-reported behavior.
Validity	Accuracy
Reliability	Consistency
According SPSS Scale Measurements are...	Interval and Ratio
Content Analysis Steps	(1) Define exact scope of the study (dates, sources, search strategy); (2) Operationalize variables to code; (3) Refine coding system & test reliability; (4) Code the content under study; (5) Analyze Patterns
Is Content Analysis Descriptive or Causal?	By itself it's descriptive. If part of a study it can be Causal.
Intercoder Reliability Test	Where independent coders, at least 2, evaluate a characteristic of a message or artifact and reach the same conclusion. Must have atleast 80% rate.
What to worry about in analyzing patterns in Content Analysis...	Caution in drawing inferences.
Types of Operationalize Variables to Code	(1) Specific Word Count; (2) Sources Quoted; (3) Topics; (4) Overt Visual Image; (5) Voice Inflections; (6) Subtle Themes; (7) Global Code
Uses in Content Analysis	History, Public Relations, National Intelligence, Lobbying, Detective Work, Mass Communication, Linguistics
Content Analysis	Systematic analysis of patterns in communications
When to use inferential Stats?	Randomized - ALWAYS! Population - Use if group can be used as a sample.
Qualitative Research	More exploratory, small purposive "samples", open-ended semi-structured interviews, more time per subject, narrative format, note researchers impact.
Quantitative Research	More defined, specific hypothesis testing, large random samples, close-ended instruments, less time per subject, data-based reports, distant/unacknowledged.
Matching Qualitative and Quantitative	Start with Qualitative research to define the issues/vocabulary, to help generate/refine research questions, test a draft questionnaire. Then conduct quantitative study. Use qualitative to explore puzzles found.
Purpose of Focus Groups	In-depth probing of views (pre-existing); Reactions to new stimuli (new responses); Group brainstorming (new idea generation);
Focus Groups Format	Recruit relevant participants; 10-12 people, 1.5 to 2 hours long, audio/video taped, semi-structured format w/ open ended agenda questions, neutral moderator.
The right number of Focus Group meetings	Depends on resources, how much is at stake, but at least more than one!
Bivariate Regression	One X, Correlation Coefficient = r, Coefficient of Determination = r2, Y=a + bX
Multiple Regression	Two or more Xs, Multiple Correlation Coefficient = Multiple R, Multiple Coefficient of Determination = Multiple R2, Y=a+b1X1 + b2X2...b#X#
Y = a + bX	a=intercept; b=slope
Multiple Correlation Coefficient	Multiple R
Multiple Coefficient of Determination	R (squared)
Unstandardized Coefficients in Multiple Regression Equations	Symbol: b; Unstandardized Partial regression coefficient/slop; slope change measured in original units;
How to interpret Unstandardized Coefficients in Multiple Regression Equations	If b is -3, subtract 3 years for every pack of cigarettes.
Standardized Coefficients in Multiple Regression Equations	Symbol: B (Greek Beta); Beta or beta weight or standardized partial regression coefficient/slope; in units standardized as Z-scores (Std. Dev. Units) to allow comparisons.
How to interpret Standarized Coefficients in Multiple Regression Equations	Use for ranking variables: The higher the beta the more powerful the X.
Multicollinearity	Overlap of variables
Dummy Variable	When there is a dichotomy within variables, this process enables the portion of the variable not being measured to not be calculated.
r	Correlation Coefficient
Correlation Coefficient (r)	Summarizes the strengths of the linear relationship between two scale variables. Perfect Positive Correlation 1.0 (Left up to Right); Perfect Negative Correlation -1.0 (Left down to Right). 0 = No correlation.
r(squared)	Coefficient of Determination
Coefficient of Determination (r2)	Indicates strength of relationship but has no negative sign. Yields lower but more intuitive score.
Role of Correlation Coefficient and Coefficient of Determination	Both summarize (in slightly different ways) the strength of the relationship between two scale variables. Neither is inferential.
Feature of Correlation Coefficient	Shows strength and direction, though somewhate inflated.
Feature of Coefficient of Determination	Shows strength and proportion of variation explained, but lacks direction sign.
Homoscedasticity	Even variation around the slope (Homo is straight)
Heteroscedasticity	Uneven Variation on the slope (Hetero is balled up)
Bivariate Analysis of Outliers	Could be bad data, but may provide lesson learned data for how to do it right or very bad.
Standard Error of the Estimate (SEE)	Applies lines that show what falls within the 68% of the regression line.
Is Standard Error of the Estimate Inferential?	Not just no, but hell no!
Aggregate Data	Units of analysis are collectivities (i.e. counties, states, countries)
Ecological Fallacy	Drawing individual-level inference from a pattern in aggregate data.

Created by: jellosix

Popular Math sets

Algebra Terms

0-7 Multiplication Facts

Learn your multiplication facts

Multiplication: 0-12

Fraction/Decimal/Percent

Multiplication: 0-12

Integer Operations

Adding Facts/Subtracting Facts

Vocabulary

Geometric Concepts: Classifying Figures and Understanding Volume

SOL 6.9, 6.10, 6.11, 6.12

Multiplication Facts up to 12X12

"Know" box contains:
Time elapsed:
Retries: