click below
click below
Normal Size Small Size show me how
Basic Stats
For Life Sciences
| Question | Answer |
|---|---|
| a statistic is? | a summary of data |
| a field of statistics is? | the collecting, analysing and understanding of data measured with uncertainty |
| what is a categorical variable? | one which is measured descriptively eg: hair colour or major at university |
| what is a define quantitative variable? | one which is measured numerically: time it takes to get home from work |
| graphical summary of one categorical variable? | bar graph |
| graphical summary of one quantitative variable? | histogram or boxplot |
| how to graphically summarise relationship between two categorical variables | clustered bar chart or jittered scatterplot |
| how to graphically summarise relationship between two quantitative variables | scatterplot |
| how to graphically summarise relationship between one categorical and one quantitative variable | comparative boxplots or comparative histograms |
| what to look for in a graph | location, spread, shape, unusual observations |
| define 'location' graphically | where most of the data lies |
| define 'spread' graphically | variability of the data, how far apart or close together it is |
| define 'shape' graphically | symetric, skewed etc |
| how to numerically summarise one categorical variable | table of frequencies or percentages |
| how to numerically summarise one quantitative variable | location: mean or median; spread: standard deviation or inter quartile range |
| formula for mean? | xhat=1/N times summation of xi; preferable for approximately normal data |
| formula for Median? | M=midn or (midn1+midn2)/ 2; less affected by outliers therefore used for outlier ridden data |
| formula for standard deviation? | s=√1/N-1 times summation of ((xi-x) squared); preferable for approximately normal data |
| formula for inter quartile range? | Q3 - Q1= IQR; less affected by outliers therefore used for outlier ridden data |
| which numbers are needed to create a five number summary? | minimum, Q1, median (sometimes mean included), Q3, maximum |
| an outlier is? | more than 1.5 x IQR lower than Q1; more than 1.5 x IQR higher than Q3 |
| define linear transformation | transformation of a variable from x to xnew |
| examples of linear transformation use | change of units; use of normal assumption therefore to find 'z' scores |
| formula for linear transformation? | xnew=a+bx |
| formula for new mean once linear transformation has occurred? | xbarnew=a+bxbar |
| formula for new median once linear transformation has occurred? | Mnew=a+bM |
| formula for new standard deviation once linear transformation has occurred? | snew=bs |
| formula for IQR once linear transformation has occurred? | 1QRnew=bIQR |
| explain density curves | area under the curve in any range of values is the proportion of all observations that fall within that range for a quantitative variable; like a smoothed out histogram describes probabilistic behaviour |
| total area under density curve equals? | 1 |
| explain the normality assumption | normal curve can be used if a histogram looks like a normal curve; termed 'reasonable'; must start at 0 and end at 0 |
| how does a normal quantile plot confirm the normality assumption? | if in a straight line, or close to it, then normal and assumption is reasonable |
| define the 68-95-99.7 rule | 68% of results will be within 1 standard deviation of the mean; 95% of results will be within 2 standard deviations of the mean; 99.7% of data will be within 3 standard deviations of the mean |
| symbol for mean of a density curve? | μ |
| symbol for standard deviation of density curve? | σ |
| normal distribution short hand | X = random variable; N = normal distribution; first number in brackets = mean; second number in brackets = standard deviation |
| explain the standard normal variable | example of set out: P = (n>Z); corresponds to the area under the curve of the corresponding region; will always be to the left of Z |
| use of the standard normal distribution table | to find P: Z found along x and y axis of table; to find Z: P found in results of table; table ordered from smallest to largest |
| reverse use of the standard normal distribution table | eg of how set out: P(Z<c)= n; c = right of Z |
| X =? | N(μ,σ) |
| formula and use of standardising transformation | Z= (X-μ)/σ; used when distribution is not N(0,1)and so it needs to be altered |
| relationships between variables best explored through? why? | scatterplot; can get a sense for the nature of the relationship |
| how to define the nature of relationship? | existent/ non-existent; strong/ weak; increasing/ decreasing; linear/ non-linear |
| outliers in scatterplots? | represent some unexplainable anomalies in data; could reveal possible systematic structure worthy of investigation |
| define casual relationship | relationship between two variables where one variable causes changes to another |
| define the explanatory variable | explains or causes the change; written on x-axis |
| define the response variable | that which changes; written on y-axis |
| useful numbers for two quantitative variables? | correlation or regression |
| formula for the correlation coefficient? | r= 1/(n-1) times summation (xi-xbar/sx)(yi-ybar/sy) |
| define xi or yi | axis values of corresponding letter |
| define xbar and ybar | mean of axis values of corresponding letter |
| define sx and sy | standard deviation of axis values of corresponding latter |
| state the properties of r | is the correlation coefficient; numerically expresses relationships; if close to 1 = strong positive linear relatoinship; if close to -1 = strong negative linear relationship; close to 0 = weak or non-existent linear relationsip |
| state the cautions about the use of r | only useful for describing linear relationships; sensitive to outliers |
| what is least squares regression used for? | to explain how a response variable is related to explanatory variable; focus positive = increase; focus negative = decrease |
| mathematical representation of regression | b1=r(sy/sx); b0=yhat-b1xbar; y=b0+b1x |
| facts about b1 | b1 = r = correlation coefficient = slope |
| how to determine the strength of a regression | rsquared = syhat/sy; r-squared is the % variation in y explained by linear regression |
| state the basic regression assumptions | y=b_0+b_1+error; error~0; error corresponds to random scatter about line; this is checked by residual plots |
| formula for residual plots? | y - y-hat |
| residual plot is a scatter plot of? | residuals(y axis) against explanatory variable(x axis) |
| interpreting residual plots | focus on pattern; there should be no pattern; if there is a pattern then the linear assumption is incorrect |
| what to do if any residuals stand out? | they are either an influential point and to be left alone; or they are an outlier and to be removed if affecting results too much |
| how to attach special cause to an outlier | analyse if recording error; refit line; if remove then justify why (down weight influence) |
| translated residuals (removing the outlier) should have what effect? | spread pattern |
| any 0 intercepting points on a residual plot are? | 1 standard deviation from mean |
| if parabola presents after outlier removal? | x-hat assumption not appropriate |
| if spread doesn't vary far from 0? | there is no pattern |
| when to remove outlier | if influences results |
| when will outlier not influence results? | when close to mean; - will have little influence on the gradient and intercept of fitted line |
| what are lurking variables? | variables that can influence results which have not been taken into account |
| to account for lurking variables you? | analyse the covariance |
| state the strategy for using data in research? | identify question to be answered; identify population studied; locate variables: which one is IV and DV, explanatory and response; obtain data which answers question |
| define anecdotal data | haphazard collection of data; unreliable for drawing conclusions |
| define available data | use of data that has come from another source possibly obtained for a reason other than the one you intend to use it for |
| define collect your own data | use of a census, a survey, or observations from an experiment |
| define census | use of whole population to obtain data |
| define sample | use of a randomised selection of the population to represent the whole; smaller and easier to do than a census |
| explain observational study | no variables are manipulated or influenced; data obtained from population as it is |
| explain experiment | variables are influenced or manipulated so that responses can be noted and recorded; usually a control group utilised control group = does not undergo treatment, act as a comparison group |
| explain causation | a response that is the result of another variable eg: moon's movements CAUSE the tides |
| common response in terms of variables means? | explanatory variable causes the response variables; response variables are associated to one another |
| causation in terms of variables means? | explanatory variable causes response variable; response variable and explanatory variable are associated |
| confounding in terms of variables means? | two or more explanatory variables are present and associated to one another; all explanatory variables could have caused response variable by themselves or together; explanatory variables called confounded causes |
| why an experiment? | allows demonstration of causation; intervention can be used to determine whether or not effect is present |
| state the principles of experiment design | subjects, treatment, factor, levels, response variable |
| definition of subjects in terms of experiment design | things upon which experiment is done; eg: people, animals, chemicals etc |
| definition of treatment in terms of experiment design | circumstances which applied to subjects; eg: given medication |
| definition of factor in terms of experiment design | variables that are apparent within different treatments; eg: given medication or placebo |
| definition of levels in terms of experiment design | formation of treatments determined by which combination of factors used; eg: dosage of medication/how many doses per day vs dosage of placebo/ how many placebo taken |
| definition of response variable in terms of principle of experiment design | the variable which will answer the question variable of most interest that is measured on subject after treatment |
| explain a principles of experiment summarisation table | Factors on x and y axis; levels in first columns and rows; rest of table = number allocated to that particular treatment group |
| state the three principles all experiments must follow | compare two or more treatments where one is the control; random assignment of subjects to treatments; repeat the experiment on numerous subjects (for reduction of confounding variables) |
| how to randomise | allocate all subjects a random number; order subjects in accordance to those random numbers (smallest to largest, or largest to smallest); form treatments by selecting subjects in a systematic pattern applied to the random numbers representing subjects |
| define control group: | different from all other treatments as it only pretends to apply explanatory variable; is the group that the results are compared against |
| explain random comparative experiment | subjects randomly allocated one of several treatments; responses compared across treatment groups |
| explain matched pairs design | break subjects with similar properties into pairs; one of two treatments applied to one of each pair; can produce more precise results; used in before and after, and twin studies |
| explain random block design | block = group of subjects known before experiment to be similar in some way that would affect response; randomised assignment of treatments to subjects within block; matched pairs is special case of this |
| experimental caution: appropriate control | only variant across treatment(s) is/are factor(s) |
| experimental caution: beware of bias | administrator of experiment can present bias towards certain treatment to certain subjects double blind accounts for this: neither subject nor administrator know which treatment applied |
| experimental caution: repetition of entire subjects | all steps for experiment are performed for all subjects in all treatments |
| experimental caution: realistic experiment | experiment needs to duplicate real-world conditions |