click below
click below
Normal Size Small Size show me how
Statistics
Question | Answer |
---|---|
Cases | Objects described by a set of data. Cases may be customers, companies, subjects in a study, or other objects |
Label | A special variable used in some data sets to distinguish the different cases |
Variable | A variable is a characteristic or case |
Categorical variable | A categorical variable places a case into one of several groups or categories |
Quantitative variable | Takes numerical values for which arithmetic operations such as adding and averaging makes sense |
Examining a distribution | Look for the overall pattern and for striking deviations from that pattern; e.g. the overall pattern of a histogram can be described by its shape, center and spread |
Symmetric and Skewed distributions | A distribution is symmetric if the right and left sides of the histogram are approximately mirror images of each other; a distribution is skewed to the right(left) if the right(left) side of the histogram extends farther out than the left (right) side |
Mean | The mean is the average of a set of observations: a calculated "central" value of a set of numbers. |
Median | The median is the midpoint of a distribution, the number such that half the observations are smaller and the other half are larger |
Mode | The mode of a set of observations is the observation that is repeated more often than any other |
Quartiles | Each of four equal groups into which a population can be divided according to the distribution of values of a particular variable |
Five Number Summary | The five number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. |
Standard Deviation | The variance of a set of observations is essentially the average of the squares of the deviations of the observations from their mean. The standard deviation is the square root of the variance . |
Density curves | A density curve is a curve that is always on or above the horizontal axis and has area exactly 1 underneath it; describes the overall pattern of a distribution; the area under the curve and above any range of values is the proportion of all observations t |
Empirical Rule | In the Normal distribution with mean mu (μ) and standard deviation sigma σ: 68% of the observations fall within σ of μ;95% of the observations fall within 2σ of μ;99.7% of the observations fall within 3σ of μ |
Standard Normal distribution | The standard normal distribution is the Normal distribution N(0,1) with mean 0 and standard deviation 1 |
Histogram | A diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval |
Bar Chart | A diagram in which the numerical values of variables are represented by the height or length of lines or rectangles of equal width. |
Pie Chart | A type of graph in which a circle is divided into sectors that each represent a proportion of the whole |
Response variable | A response variable measures an outcome of a study |
Explanatory variable | An explanatory variable explains or influences changes in a response variable |
Scatterplot | A scatterplot displays the relationship between two quantitative variables measured on the same individuals |
Scatterplot Association | Two variables are positively associated when above average values of one tend to accompany above average values of the other, two variables are negatively associated when above average values of one tend to accompany below average values of the other |
Scatterplot Strength | The strength of a relationship is determined by how close the points in the scatterplot lie to a simple form such as a line |
Scatterplot Direction | If the relationship has a clear direction, we speak of either positive association (high values of the two variables tend to occur together) or negative association (high values of one variable tend to occur with low values of the other variable) |
Outlier | An individual value that falls outside the overall pattern of a distribution |
Correlation | The correlation measures the direction and strength of the linear relationship between two qualitative variables (usually written as r). |
Important facts about correlation | r itself has no unit of measurement, it is just a number; r is always between -1 and 1; r near 0 imply a very weak linear relationship; r=-1 or 1 occur only in case of a perfect linear relationship; r measures the strength of only a linear relationship |
Probability | The probability of any outcome of a random phenomenon is the proportion of times the outcome would occur in a very long series of repititions |
ANOVA | ANOVA or analysis of variance,is a statistical method in which the variation in a set of observations is divided into distinct components. |
Linear Regression | Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables, the explanatory variable and the response variable |
Hypothesis testing | The theory, methods, and practice of testing a hypothesis by comparing it with the null hypothesis. The null hypothesis is only rejected if its probability falls below a predetermined significance level α |
Confidence Interval | A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data |
P-value | The p-value is the level of marginal significance within a statistical hypothesis test representing the probability of the occurrence of a given event. |
Critical value | A critical value is the point (or points) on the scale of the test statistic beyond which we reject the null hypothesis, and is derived from the level of significance α of the test |
Level of Significance | The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. |