Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

BIOS480 Exam 1

Definitions and info for exam 1 of BIOS480

TermDefinition
Sample a random collection of observations from a population
Population all possible observations
Frequentist statistics probability of an outcome as how it is likely to happen in a very long run of events
Random variable a variable whose values are not known for certain before a sample is taken
Sample space the set of all possible outcomes of a random variable
Probability distribution Distribution of observing each outcome in the sample space (tells you how high, or low, the likelihood of seeing an outcome is
Probability mass function a mathematical equation used to describe a curve assigned to the probability of seeing each potential outcome
Probability density function used for continuous variables, vertical axis is the probability density of the variable (f(x))
for a continuous variable x = ? 0; it is impossible to get the probability of a very specific outcome if it is continuous
Expected value (E(x) the mean of the probability distribution of a random variable ("the average"); the most likely outcome of everything in the sample space
Mean the arithmetic average value
Standard deviation the dispersion of the data about the mean
Kurtosis The "fatness" of the tails of the distribution; degree of outliers in the distribution
Skewness Refers to deviations in the distribution's symmetry
Right-skewed mean > median > mode
Left-skewed mode > median > mean
Normal (Gaussian) distribution Symmetric distribution where most distributions are around the mean (bell curve)
Normal distribution features Symmetric bell shape (curve), mean = median at center, parameters = mean and standard deviation, 68% of data w/in 1 SD of mean 95% of data w/in 2 SD of mean 99.7% of data w/in 3 SD of mean
Lognormal distribution Right-skewed data whose logarithm is normally distributed
Lognormal distribution features Continuous w/ normal, skewed distribution Low mean values Large variance All-positive values Parameters = mean and standard deviation * If you take the log, the distribution shifts to normal (data transformation)
Exponential distribution Model elapsed time between two events Continuous probability distribution describing waiting time until next event in a Poisson process (waiting time before next event)
Exponential distribution features Single parameter distribution Parameter = rate
Beta distribution Used to represent percentages, proportions, or probability outcomes
Beta distribution features Defined on interval [0, 1] Parameters = alpha and beta (two positive shape parameters that appear as exponents of the random variable and control shape of distribution)
Bernoulli distribution Single trial with two possible outcomes Ex., coin toss Outcomes are "success" or "failure"
Binomial distribution A sequence of Bernoulli events * The probability distribution of the number of successes in a set number of independent trials
Binomial distribution features Parameters = number of trials (n) and probability of success ina single trial (p) Expected value of a binomial trial "x" is the number of times a success is expected to occur in n total trials
Multinomial distribution Generalization of the binomial distribution When there are more than two possible outcomes, which is the joint possibility distribution of multiple outcomes from n fixed trials
Poisson distribution The probability that an event (or number of events) may occur Describes variables representing # of occurrences of a particular event in an interval of time/space
Poisson distribution features Parameters: expected value = variance Generally right skewed
Uniform distribution All outcomes are equally likely
Sample size # of observations in the sample (n)
Statistics Measured characteristics of the SAMPLE (Ex., sample mean)
Parameters Characteristics of the POPULATION (Ex., population mean)
Random (simple) sampling Basic method of collecting observations in a sample Any observation has the same probability of being collected Aim is to sample in a manner that doesn't create bias/favor any observation being selected
Random sample = ? Independent and identically distributed (IID)
Independent (IID) Sample items are all independent events Knowledge of the value of one variable gives no information about the other and vice versa
Identically distributed (IID) No overall trends The distribution doesn't fluctuate, and all items in the sample are taken from the same probability distribution
Random sampling is usually ? Haphazard Populations must be defined at the start of a study therefore there are spatial and temporal limits
More than 1 in 20 US teens have diagnosed anxiety or depression (Parameter or statistic) Statistic, # describes whole population of US teens. It is impossible to collect info from every member
Latvian women are the tallest on the planet w/ a mean height of 170 cm (Parameter or statistic) Statistic, describes the whole population. Not feasible to measure the height of every Latvian woman
The median annual income of all 37 employees at Company Y is $42,000 (Parameter or statistic) Parameter, It is ALL employees at Y. not just a portion of them
The avg final math exam scores of all seniors from high school A have increased from 70% to 78% in the past decade (Parameter or statistic) Parameter, % changed referred to the entire population of high schoolers
A good estimator (i.e., statistic) of a population parameter should have the following characteristics: Unbiased - Expected value of the sample statistic should = the parameter Consistent - Sample size increases then the statistic will get closer to the population parameter Efficient - It has the lowest variance among all competing statistics
The two broad types of estimation are ? and ? Point estimate Interval estimate
Point estimate provides a single value which estimates a population parameter
Parts of a point estimate Mean: estimator of population mean, weighted by 1/n Median: middle measurement of data set Trimmed mean Windsorized mean
Trimmed mean Mean calculated after omitting a proportion (usually 5%) of highest and lowest observations
Windsorized mean Same as for trimmed means except the omitted observations are replaced by the nearest remaining value
Interval estimate Provides a range of values that might include the parameter with a known probability (Ex., confidence intervals)
Range The difference between the largest and smallest observation Simplest measure of spread but no clear link between sample range and population range Generally increases as sample size increases
Sample variance Estimate of the population variance
Sample variance steps 1. Calculate mean 2. Subtract the mean and square the result 3. Work out the average of those differences If the result is s^2 and you need a length, you must square root your result
Standard deviation Square root of sample variance
Coefficient of variation Ratio of standard deviation to the mean and shows the extent of variability in relation to the mean of the population
Interquartile range difference between the first quartile (the observation which has 25% of the observations below it) and the third quartile (the observation which has 25% of the observations above it) >Used in construction of box plots
Median absolute deviation (MAD) less sensitive to outliers than the other measures and is the sensible measure of spread to present in association with medians
Confidence intervals A range of values where you can be relatively confident the true value will be
Central limit theorem The sampling distribution of a sample mean is approximately normal if the sample size is large enough (n > 30), even if the population distribution is not normal Distribution of averages (means)
Formula for confidence interval estimate for a population Sample mean +/- critical value * standard error
Standard error The standard deviation of the sample means (s)/(square root(n))
Critical value Conversion of the sample mean to a critical value by using a t-distribution
How to obtain a "t" critical value can be obtained from the t-distribution based on the degree of freedom (df) and the confidence level you are using
"t" critical value Used during statistical tests to assess the statistical significance of the difference between wo sample means, the constriction of confidence intervals and in linear regression analysis
Degrees of freedom (df) number of independent values that a statistical analysis can estimate df = n - 1 You want as many degrees of freedom as possible
What if you knew the population standard deviation and want to calculate the confidence interval? sample mean +/- critical value * standard error
z-distribution The Standard Normal distribution; form of the normal distribution in which the mean is zero and the standard deviation is 1
t- vs z-distribution z-distribution assumes that you know the population standard deviation. t-distribution is based on the sample standard deviation
Methods for estimating parameters Method of moments Maximum likelihood Ordinary least squares Bootstrapping
Moments of a function (Method of moments) Every function has a moment: > Mean > Standard deviation > Kurtosis > Skewness
Maximum likelihood Method that determines values for the parameters of a model "Which curve was most likely responsible for creating the data points that we observed?"
Maximum likelihood estimates (MLE) Find the parameter values that give the distribution that maximize the probability of observing the data Calculating the total probability of observing all of the data Trying to maximize mean and standard deviation
With MLE you assume what if it follows a normal probability distribution? Assuming that it follows a normal probability distribution, you multiply each point by the others in a formula
Ordinary least squares (OLS) The least squares estimator for a given parameter is the one that minimizes the sum of the squared differences between each value in a sample and the parameter
Major application of OLS estimation When we are estimating parameters of linear models, where the previous equation represents the sum of squared differences between observed values and those predicted by the model
ML and OLS are most commonly used for... Estimating population parameters for the analyses we will discuss in this class
Bootstrapping Stat. technique for estimating quantities about a population by avg. estimates from multiple small data samples (resampling w/ replacement)
Bootstrapping applications Estimate confidence intervals for parameters (mean, median, variance, etc.) Estimate p-values when traditional parametric methods are challenging/not applicable Assess stability and reliability of models
Hypothesis testing Statistical methods used to determine whether a pattern in a sample data is likely to hold true in the population from which the sample was drawn
Hypothesis testing steps 1. State null and alternative hypotheses 2. Choose a significance level 3. Collect data and calculate the test statistic 4. Calculate p-value 5. Make a decision about the null hypothesis
Reject H0 if... p-value < significant level = reject H0
Fail to reject H0 if... p-value > significant level = fail to reject h0
Two-tailed In most cases in biology, the H0 has no effect, and the HA can be in either direction Distributing uncertainty to two sides, one-tailed is only one H0 is rejected if one mean is bigger than the other Tests for an effect in both directions (+/-)
One-tailed One tailed only distributes uncertainty to one side Only allow for testing in one direction
Parametric tests Statistical tests most commonly used by biologists Ex., z-test, t-test, ANOVA
4 assumptions of parametric tests Normality Homogeneity of variances Linearity Independence
How to check for normality To test assumption of normal distribution, skewness should be w/in +/- 2 range. Kurtosis values should be w//in +/- 7 range. (LOOK FOR BELL CURVE) Shapiro-Wilk’s W test Kolmogorov-Smirnov test Histograms, boxplots, and Q-Q plots
Q-Q (quantile-quantile) plots Plots observed quantiles and expected quantiles are plotted on a graph. If the plotted value varies more from a straight line, then data is not normally distributed
Most common asymmetry in biological data Positive skewness Often b/c variables have a lognormal or Poisson distribution Transformations of skewed variables can often improve normality
Homogeneity of variances More important than normality (more robust if sample size is =) Linear model hypotheses assume that variance in the response variable is the same at each level, or combination of levels, of the predictor variables
How to check for homogeneity of variances Bartlett's test Levene's test Flinger-Killeen's test
Linearity Parametric correlation and linear regression analyses are based on straight-line relationships between variables
Independence Implies that all the other observations should be independent of each other both w/in and between groups. The most common situation where this assumption is not met is when data are recorded in a time sequence
Graphical exploration of data is used to ... Assess assumption violations Detect errors in data entry Detect patterns in data that may not be revealed by the statstical analysis you will use Detect unusual values (i.e., outliers)
Histograms Graph used to represent the distribution of data points of one variable Often classify data into various “bins” or “range groups” and count how many data points belong to each of those bins
Boxplots a plot showing the median in the center, the spread, the quartiles (25%), and potential outliers Because boxplots based on medians and quartiles, very resistant to extreme values Good for displaying single variable sample observations of size 8+
Scatter plots Vertical axis represents one variable, the horizontal axis represents the other variable and the points on the plot are individual observations Often used b/c we are interested in the relationship b/t variables
Four possible outcomes of testing a null hypothesis (H0) Correctly reject H0 ("significant result), implies HA true Correctly retain H0, implies HA false Type I error, rejecting H0 when it is true Type II error, not rejecting H0 when it is false
Type I error Rejecting null hypothesis (H0) when it is true Concluding results are stat. significant, but they actually occurred out of chance/unrelated factors Caused by significance level you choose
Type II error Not rejecting H0 when it is false Failing to conclude an effect when there was Study may have needed more statistical power to detect it
Risk of type II error is inversely related to the ... statistical power of the study Higher stat. power = lower probability of type II error
To reduce the risk of type II error you can ... increase sample size or significance level
Statistical power is determined by the ... Size of the effect (larger effects are more easily detected) Measurement error (systematic/random errors in data deduce power) Sample size (larger sample = reduced error + increased power) Significance level (inc. significance level = inc. power)
Standard error Standard deviation of the sample means (s)/(squareroot of n) s = sample estimate of SD n = sample size
Type I vs Type II error trade-offs Type I and II influence one another Sig. level (type I) affects stat. power, which is inversely related to type II error rate Setting lower sig. level decreases type I risk, but inc. type II Inc. power of a test decreases type II risk, but inc. type I
Rank-based non-parametric test examples Wilcoxon signed rank test Mann-Whitney U test Kruskal-Wallis H test Spearman’s coefficient
Non-parametric tests Statistical tests that do not assume anything about the distribution followed by the data (aka distribution-free tests) Based on ranks held by different data points
Non-parametric test ranks (1-4) 1. Rank all observations, ignoring groups 2. Calculate sum of ranks for both samples 3. Compare smaller rank sum to probability of distribution of rank sums, test in usual manner 4. For larger samples, approx. norm. dist. then use z-stat.
Non-parametric test advantages More stat. power when assump. of parametric tests are violated Assump. of normality does not apply Small sample sizes are ok Can be used for all data types (ordinal, nominal, inverval)
Non-parametric test disadvantages Less powerful than parametric tests if assump. haven't been violated
Randomization/permutation tests Resample/reshuffle original data many times to generate sampling distribution of test stat. directly Generate simulated data like those we would expect under H0
Randomization/permutation tests advantages Useful when analyzing data for which the distribution is unknown, when sampling from populations is not possible (i.e., museum specimens)
Randomization/permutation tests disadvantages Computationally intensive Difficult to compare p-values across different analyses/studies May have lower power when sample sizes are small
Randomization/permutation tests main application Often used to double check more traditional hypothesis test methods. If both tests are significant, then you can be pretty confident about your results.
Correcting a violation can be done by ... Transforming the data
Transformations can ... Make data closer to a normal distribution Reduce relationship b/t the mean and variance Reduce outlier influence Improve linearity in regression analyses
Types of transformations Power Log Arc-sine Box-Cox
Power transformation + application Transforms Y to Yp, where p is greater than 0 For data w/ right skew P=0.5 for data that are counts (Poisson ) and the variance is related to the mean. Cube roots (p=0.33), fourth roots (p=0.25), etc., effective for data that are inc. skewed
Log transformation + application Transforming data to logarithms Make positively skewed distributions more symmetrical, especially when the mean is related to the SD Lognormal b/c it can be normal by log transforming the values
Arc-Sine transformation + application Taking the arcsine of the square root of a number The result is given in radians and can range from −π/2 to π/2 Numbers must be in range 0 to 1 Commonly used for proportions and probabilities
Box-Cox transformation + application Can be used to find best transformation in terms of homogeneity of variance and normality o Transformation on an exponent (lambda, λ), which varies from -5 to 5. All values of λ are considered and the optimal value is selected
Created by: sdelo
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards