Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

Normal Size Small Size show me how

# Basic Stats

### For Life Sciences

Question | Answer |
---|---|

a statistic is? | a summary of data |

a field of statistics is? | the collecting, analysing and understanding of data measured with uncertainty |

what is a categorical variable? | one which is measured descriptively eg: hair colour or major at university |

what is a define quantitative variable? | one which is measured numerically: time it takes to get home from work |

graphical summary of one categorical variable? | bar graph |

graphical summary of one quantitative variable? | histogram or boxplot |

how to graphically summarise relationship between two categorical variables | clustered bar chart or jittered scatterplot |

how to graphically summarise relationship between two quantitative variables | scatterplot |

how to graphically summarise relationship between one categorical and one quantitative variable | comparative boxplots or comparative histograms |

what to look for in a graph | location, spread, shape, unusual observations |

define 'location' graphically | where most of the data lies |

define 'spread' graphically | variability of the data, how far apart or close together it is |

define 'shape' graphically | symetric, skewed etc |

how to numerically summarise one categorical variable | table of frequencies or percentages |

how to numerically summarise one quantitative variable | location: mean or median; spread: standard deviation or inter quartile range |

formula for mean? | xhat=1/N times summation of xi; preferable for approximately normal data |

formula for Median? | M=midn or (midn1+midn2)/ 2; less affected by outliers therefore used for outlier ridden data |

formula for standard deviation? | s=√1/N-1 times summation of ((xi-x) squared); preferable for approximately normal data |

formula for inter quartile range? | Q3 - Q1= IQR; less affected by outliers therefore used for outlier ridden data |

which numbers are needed to create a five number summary? | minimum, Q1, median (sometimes mean included), Q3, maximum |

an outlier is? | more than 1.5 x IQR lower than Q1; more than 1.5 x IQR higher than Q3 |

define linear transformation | transformation of a variable from x to xnew |

examples of linear transformation use | change of units; use of normal assumption therefore to find 'z' scores |

formula for linear transformation? | xnew=a+bx |

formula for new mean once linear transformation has occurred? | xbarnew=a+bxbar |

formula for new median once linear transformation has occurred? | Mnew=a+bM |

formula for new standard deviation once linear transformation has occurred? | snew=bs |

formula for IQR once linear transformation has occurred? | 1QRnew=bIQR |

explain density curves | area under the curve in any range of values is the proportion of all observations that fall within that range for a quantitative variable; like a smoothed out histogram describes probabilistic behaviour |

total area under density curve equals? | 1 |

explain the normality assumption | normal curve can be used if a histogram looks like a normal curve; termed 'reasonable'; must start at 0 and end at 0 |

how does a normal quantile plot confirm the normality assumption? | if in a straight line, or close to it, then normal and assumption is reasonable |

define the 68-95-99.7 rule | 68% of results will be within 1 standard deviation of the mean; 95% of results will be within 2 standard deviations of the mean; 99.7% of data will be within 3 standard deviations of the mean |

symbol for mean of a density curve? | μ |

symbol for standard deviation of density curve? | σ |

normal distribution short hand | X = random variable; N = normal distribution; first number in brackets = mean; second number in brackets = standard deviation |

explain the standard normal variable | example of set out: P = (n>Z); corresponds to the area under the curve of the corresponding region; will always be to the left of Z |

use of the standard normal distribution table | to find P: Z found along x and y axis of table; to find Z: P found in results of table; table ordered from smallest to largest |

reverse use of the standard normal distribution table | eg of how set out: P(Z<c)= n; c = right of Z |

X =? | N(μ,σ) |

formula and use of standardising transformation | Z= (X-μ)/σ; used when distribution is not N(0,1)and so it needs to be altered |

relationships between variables best explored through? why? | scatterplot; can get a sense for the nature of the relationship |

how to define the nature of relationship? | existent/ non-existent; strong/ weak; increasing/ decreasing; linear/ non-linear |

outliers in scatterplots? | represent some unexplainable anomalies in data; could reveal possible systematic structure worthy of investigation |

define casual relationship | relationship between two variables where one variable causes changes to another |

define the explanatory variable | explains or causes the change; written on x-axis |

define the response variable | that which changes; written on y-axis |

useful numbers for two quantitative variables? | correlation or regression |

formula for the correlation coefficient? | r= 1/(n-1) times summation (xi-xbar/sx)(yi-ybar/sy) |

define xi or yi | axis values of corresponding letter |

define xbar and ybar | mean of axis values of corresponding letter |

define sx and sy | standard deviation of axis values of corresponding latter |

state the properties of r | is the correlation coefficient; numerically expresses relationships; if close to 1 = strong positive linear relatoinship; if close to -1 = strong negative linear relationship; close to 0 = weak or non-existent linear relationsip |

state the cautions about the use of r | only useful for describing linear relationships; sensitive to outliers |

what is least squares regression used for? | to explain how a response variable is related to explanatory variable; focus positive = increase; focus negative = decrease |

mathematical representation of regression | b1=r(sy/sx); b0=yhat-b1xbar; y=b0+b1x |

facts about b1 | b1 = r = correlation coefficient = slope |

how to determine the strength of a regression | rsquared = syhat/sy; r-squared is the % variation in y explained by linear regression |

state the basic regression assumptions | y=b_0+b_1+error; error~0; error corresponds to random scatter about line; this is checked by residual plots |

formula for residual plots? | y - y-hat |

residual plot is a scatter plot of? | residuals(y axis) against explanatory variable(x axis) |

interpreting residual plots | focus on pattern; there should be no pattern; if there is a pattern then the linear assumption is incorrect |

what to do if any residuals stand out? | they are either an influential point and to be left alone; or they are an outlier and to be removed if affecting results too much |

how to attach special cause to an outlier | analyse if recording error; refit line; if remove then justify why (down weight influence) |

translated residuals (removing the outlier) should have what effect? | spread pattern |

any 0 intercepting points on a residual plot are? | 1 standard deviation from mean |

if parabola presents after outlier removal? | x-hat assumption not appropriate |

if spread doesn't vary far from 0? | there is no pattern |

when to remove outlier | if influences results |

when will outlier not influence results? | when close to mean; - will have little influence on the gradient and intercept of fitted line |

what are lurking variables? | variables that can influence results which have not been taken into account |

to account for lurking variables you? | analyse the covariance |

state the strategy for using data in research? | identify question to be answered; identify population studied; locate variables: which one is IV and DV, explanatory and response; obtain data which answers question |

define anecdotal data | haphazard collection of data; unreliable for drawing conclusions |

define available data | use of data that has come from another source possibly obtained for a reason other than the one you intend to use it for |

define collect your own data | use of a census, a survey, or observations from an experiment |

define census | use of whole population to obtain data |

define sample | use of a randomised selection of the population to represent the whole; smaller and easier to do than a census |

explain observational study | no variables are manipulated or influenced; data obtained from population as it is |

explain experiment | variables are influenced or manipulated so that responses can be noted and recorded; usually a control group utilised control group = does not undergo treatment, act as a comparison group |

explain causation | a response that is the result of another variable eg: moon's movements CAUSE the tides |

common response in terms of variables means? | explanatory variable causes the response variables; response variables are associated to one another |

causation in terms of variables means? | explanatory variable causes response variable; response variable and explanatory variable are associated |

confounding in terms of variables means? | two or more explanatory variables are present and associated to one another; all explanatory variables could have caused response variable by themselves or together; explanatory variables called confounded causes |

why an experiment? | allows demonstration of causation; intervention can be used to determine whether or not effect is present |

state the principles of experiment design | subjects, treatment, factor, levels, response variable |

definition of subjects in terms of experiment design | things upon which experiment is done; eg: people, animals, chemicals etc |

definition of treatment in terms of experiment design | circumstances which applied to subjects; eg: given medication |

definition of factor in terms of experiment design | variables that are apparent within different treatments; eg: given medication or placebo |

definition of levels in terms of experiment design | formation of treatments determined by which combination of factors used; eg: dosage of medication/how many doses per day vs dosage of placebo/ how many placebo taken |

definition of response variable in terms of principle of experiment design | the variable which will answer the question variable of most interest that is measured on subject after treatment |

explain a principles of experiment summarisation table | Factors on x and y axis; levels in first columns and rows; rest of table = number allocated to that particular treatment group |

state the three principles all experiments must follow | compare two or more treatments where one is the control; random assignment of subjects to treatments; repeat the experiment on numerous subjects (for reduction of confounding variables) |

how to randomise | allocate all subjects a random number; order subjects in accordance to those random numbers (smallest to largest, or largest to smallest); form treatments by selecting subjects in a systematic pattern applied to the random numbers representing subjects |

define control group: | different from all other treatments as it only pretends to apply explanatory variable; is the group that the results are compared against |

explain random comparative experiment | subjects randomly allocated one of several treatments; responses compared across treatment groups |

explain matched pairs design | break subjects with similar properties into pairs; one of two treatments applied to one of each pair; can produce more precise results; used in before and after, and twin studies |

explain random block design | block = group of subjects known before experiment to be similar in some way that would affect response; randomised assignment of treatments to subjects within block; matched pairs is special case of this |

experimental caution: appropriate control | only variant across treatment(s) is/are factor(s) |

experimental caution: beware of bias | administrator of experiment can present bias towards certain treatment to certain subjects double blind accounts for this: neither subject nor administrator know which treatment applied |

experimental caution: repetition of entire subjects | all steps for experiment are performed for all subjects in all treatments |

experimental caution: realistic experiment | experiment needs to duplicate real-world conditions |

Created by:
Nymphette