Question 1

a statistic is?

Accepted Answer

a summary of data

Question 2

a field of statistics is?

Accepted Answer

the collecting, analysing and understanding of data measured with uncertainty

Question 3

what is a categorical variable?

Accepted Answer

one which is measured descriptively eg: hair colour or major at university

Question 4

what is a define quantitative variable?

Accepted Answer

one which is measured numerically: time it takes to get home from work

Question 5

graphical summary of one categorical variable?

Accepted Answer

bar graph

Question 6

graphical summary of one quantitative variable?

Accepted Answer

histogram or boxplot

Question 7

how to graphically summarise relationship between two categorical variables

Accepted Answer

clustered bar chart or jittered scatterplot

Question 8

how to graphically summarise relationship between two quantitative variables

Accepted Answer

scatterplot

Question 9

how to graphically summarise relationship between one categorical and one quantitative variable

Accepted Answer

comparative boxplots or comparative histograms

Question 10

what to look for in a graph

Accepted Answer

location, spread, shape, unusual observations

Question 11

define 'location' graphically

Accepted Answer

where most of the data lies

Question 12

define 'spread' graphically

Accepted Answer

variability of the data, how far apart or close together it is

Question 13

define 'shape' graphically

Accepted Answer

symetric, skewed etc

Question 14

how to numerically summarise one categorical variable

Accepted Answer

table of frequencies or percentages

Question 15

how to numerically summarise one quantitative variable

Accepted Answer

location: mean or median;
spread: standard deviation or inter quartile range

Question 16

formula for mean?

Accepted Answer

xhat=1/N times summation of xi;
preferable for approximately normal data

Question 17

formula for Median?

Accepted Answer

M=midn or (midn1+midn2)/ 2;
less affected by outliers therefore used for outlier ridden data

Question 18

formula for standard deviation?

Accepted Answer

s=√1/N-1 times summation of ((xi-x) squared);
preferable for approximately normal data

Question 19

formula for inter quartile range?

Accepted Answer

Q3 - Q1= IQR;
less affected by outliers therefore used for outlier ridden data

Question 20

which numbers are needed to create a five number summary?

Accepted Answer

minimum, Q1, median (sometimes mean included), Q3, maximum

Question 21

an outlier is?

Accepted Answer

more than 1.5 x IQR lower than Q1;
more than 1.5 x IQR higher than Q3

Question 22

define linear transformation

Accepted Answer

transformation of a variable from x to xnew

Question 23

examples of linear transformation use

Accepted Answer

change of units;
use of normal assumption therefore to find 'z' scores

Question 24

formula for linear transformation?

Accepted Answer

xnew=a+bx

Question 25

formula for new mean once linear transformation has occurred?

Accepted Answer

xbarnew=a+bxbar

Question 26

formula for new median once linear transformation has occurred?

Accepted Answer

Mnew=a+bM

Question 27

formula for new standard deviation once linear transformation has occurred?

Accepted Answer

snew=bs

Question 28

formula for IQR once linear transformation has occurred?

Accepted Answer

1QRnew=bIQR

Question 29

explain density curves

Accepted Answer

area under the curve in any range of values is the proportion of all observations that fall within that range for a quantitative variable;
like a smoothed out histogram describes probabilistic behaviour

Question 30

total area under density curve equals?

Accepted Answer

1

Question 31

explain the normality assumption

Accepted Answer

normal curve can be used if a histogram looks like a normal curve; 
termed 'reasonable';
must start at 0 and end at 0

Question 32

how does a normal quantile plot confirm the normality assumption?

Accepted Answer

if in a straight line, or close to it, then normal and assumption is reasonable

Question 33

define the 68-95-99.7 rule

Accepted Answer

68% of results will be within 1 standard deviation of the mean;
95% of results will be within 2 standard deviations of the mean;
99.7% of data will be within 3 standard deviations of the mean

Question 34

symbol for mean of a density curve?

Accepted Answer

μ

Question 35

symbol for standard deviation of density curve?

Accepted Answer

σ

Question 36

normal distribution short hand

Accepted Answer

X = random variable;
N = normal distribution;
first number in brackets = mean;
second number in brackets = standard deviation

Question 37

explain the standard normal variable

Accepted Answer

example of set out: P = (n>Z);
corresponds to the area under the curve of the corresponding region;
will always be to the left of Z

Question 38

use of the standard normal distribution table

Accepted Answer

to find P: Z found along x and y axis of table;
to find Z: P found in results of table;
table ordered from smallest to largest

Question 39

reverse use of the  standard normal distribution table

Accepted Answer

eg of how set out: P(Z<c)= n;
c = right of Z

Question 40

X =?

Accepted Answer

N(μ,σ)

Question 41

formula and use of standardising transformation

Accepted Answer

Z= (X-μ)/σ;
used when distribution is not N(0,1)and so it needs to be altered

Question 42

relationships between variables best explored through? why?

Accepted Answer

scatterplot;
can get a sense for the nature of the relationship

Question 43

how to define the nature of relationship?

Accepted Answer

existent/ non-existent;
strong/ weak;
increasing/ decreasing;
linear/ non-linear

Question 44

outliers in scatterplots?

Accepted Answer

represent some unexplainable anomalies in data;
could reveal possible systematic structure worthy of investigation

Question 45

define casual relationship

Accepted Answer

relationship between two variables where one variable causes changes to another

Question 46

define the explanatory variable

Accepted Answer

explains or causes the change;
written on x-axis

Question 47

define the response variable

Accepted Answer

that which changes;
written on y-axis

Question 48

useful numbers for two quantitative variables?

Accepted Answer

correlation or regression

Question 49

formula for the correlation coefficient?

Accepted Answer

r= 1/(n-1) times summation (xi-xbar/sx)(yi-ybar/sy)

Question 50

define xi or yi

Accepted Answer

axis values of corresponding letter

Question 51

define xbar and ybar

Accepted Answer

mean of axis values of corresponding letter

Question 52

define sx and sy

Accepted Answer

standard deviation of axis values of corresponding latter

Question 53

state the properties of r

Accepted Answer

is the correlation coefficient;
numerically expresses relationships;
if close to 1 = strong positive linear relatoinship;
if close to -1 = strong negative linear relationship;
close to 0 = weak or non-existent linear relationsip

Question 54

state the cautions about the use of r

Accepted Answer

only useful for describing linear relationships;
sensitive to outliers

Question 55

what is least squares regression used for?

Accepted Answer

to explain how a response variable is related to explanatory variable;
focus positive = increase;
focus negative = decrease

Question 56

mathematical representation of regression

Accepted Answer

b1=r(sy/sx);
b0=yhat-b1xbar;
y=b0+b1x

Question 57

facts about b1

Accepted Answer

b1 = r = correlation coefficient = slope

Question 58

how to determine the strength of a regression

Accepted Answer

rsquared = syhat/sy;
r-squared is the % variation in y explained by linear regression

Question 59

state the basic regression assumptions

Accepted Answer

y=b_0+b_1+error;
error~0;
error corresponds to random scatter about line;
this is checked by residual plots

Question 60

formula for residual plots?

Accepted Answer

y - y-hat

Question 61

residual plot is a scatter plot of?

Accepted Answer

residuals(y axis) against explanatory variable(x axis)

Question 62

interpreting residual plots

Accepted Answer

focus on pattern;
there should be no pattern;
if there is a pattern then the linear assumption is incorrect

Question 63

what to do if any residuals stand out?

Accepted Answer

they are either an influential point and to be left alone;
or they are an outlier and to be removed if affecting results too much

Question 64

how to attach special cause to an outlier

Accepted Answer

analyse if recording error;
refit line;
if remove then justify why (down weight influence)

Basic Stats

For Life Sciences

"Know" box contains:
Time elapsed:
Retries: