Term | Definition |
ANOVA (Analysis of Variance) | A test for two or more variables. Similar to t-test & can sub for a t-test when only two groups are compared. Will show if there is difference in mean between multiple groups. Will not show which group differs from which. |
Assumptions | Characteristics that must be satisfied in order for a statistical test to be accurate and robust. |
Bar graph | A chart that reports the frequencies of observations in groupings (bins) of qualitative data. Gaps between columns indicate data is qualitative (nominal, categorical or ordinal.) |
Bin | In a frequency table or chart, bins are ranges of values used to group data. Range for each bin should be equal between all bins. Must be mutually exclusive, meaning no observation can be counted in more than one bin. |
Binary data | A type of nominal or categorical data that includes only two discrete categories, such as T/F, Y/N, or male/female. |
Box & Whisker plots | A chart used to show a five-number summary. Often displays outliers. |
Categorical data | Non-measurable observations that can be assigned to one unique category, (similar to nominal data). Can be many categories. Categorical data converted into numerical data by counting the number of observations in each category. |
Central tendency | A single number that represents the typical value for the data set, expressed as a mean, median and/or mode. |
Chi square (X2) | A nonparametric statistical test more often used to test for differences in frequencies between two or more groups. |
Confidence interval | The range in which the true parameter can be found, w/in a given percentage likelihood. Most frequently used value is 95%. 95% CI means 95% the test statistic will fall within a given range. Useful b/c helps determine if outcome is sufficiently narrow. |
Continuous data | Interval & radio data.; Infinite # possible values when measuring on a confidence scale. Ordered data, no natural categories, maybe true zero (ratio), or no true zero (interval), & have measurable/equivalent distance between each increment in the scale. |
Data transformation | A procedure that converts data points into another form that has a normal distribution. |
Descriptive Statistics | Numerically describing a phenomenon. |
Dot plot | A graph that shows frequencies of observations. Similar to histograms in that they are used to show frequencies, but they have the advantage of making it somewhat easier to see how many items are in each column. |
Five-number summary | Shorthand way of reporting five commonly used descriptive stats (minimum, lower quartile, median, upper quartile, maximum). Values in a five-number summary can be calculated on ordinal, interval, or ratio data. |
Forest plot | Special usage of box-and-whisker plots. Includes series of these graphs w/trend line across different plots. |
FPG | Fasting plasma glucose. |
Frequency | Descriptive stat showing number of observations or the populations of observations (if expressed as percentage) |
Histogram | Frequency chart used w/continuous data (interval & ratio). |
Icon displays | Visual representation that utilizes icons to represent concepts. (i.e.: cates plots & visual analog scales.) |
Inferential statistics | Deriving inferences about a population based on a representative sample. |
Interval data | Measurable observations w/ natural order & equivalence, but no true zero point.Similar to ratio data in that each observation is equivalent distance from next. Values in an interval scale can be added & subtracted, but not multiplied or divided. |
IQR (Interquartile Range) | The difference between the upper quartile & lower quartile. |
Likert Scale | A survey response list that is ordinal in nature. (5=agree, 4=agree, etc.) Between 5 & 10 response options are most common. |
Maximum | Highest values in an ordered data set. |
Mean | Average of a data set. Usually arithmetic mean is calculated. |
Median | The middle observation in an ordered data set. It is the point at which the data set can be cut in half. When there is an odd number of observations, the median is the one that has an equal number of observation above and below. |
Minimum | The lowest value in an ordered data set. |
Mode | The most frequently occurring value in an ordered data set. Data sets can have more than one mode (multimodal) |
Multimodal | Data set w/more than one mode (more than one value that occurs most frequently). Multimodal distributions are non-normal by definition. |
n | A lowercase n stands for number and represents number of subjects in a sample. |
Nominal data | In positivistic research, nominal data are designated as qualitative because they are not measurable. Nominal data are observations that are assigned names, such as male or female. |
Nonparametric statistics | Stat procedures designed to be used w/data that do not meet assumptions of normality and/or data that are qualitative. Require fewer assumptions than parametric statistics. |
Normal distribution | Set of data w/single center point that represents mean, median and mode whose values are equally distributed over both sides of mean. values must be interval or ratio scale. Normal distribution looks like bell curve. |
Observation | A single data point representing a phenomenon of interest. |
Ordinal data | Non-measurable observations that can be places into a specific order (i.e. Likert scale). Even though observations have a value relative to one another, there is no natural measurement between them. |
Outliers | An extreme value. Can, by itself, alter the determination of significance. Might also indicate an error has been made. In box-whisker plot, outlier is value that falls above or below the whiskers. |
Paired-samples t-test | AKA “dependent-samples t-test”. Used to determine if a significant difference exists between the means of two dependent (or related) measures that have a similar sample size and degree of variation. |
Parameter | Descriptive measure of a population. Usually cannot be known or measured and a sample must be drawn. Symbols used to represent parameters differ from those used to represent sample statistics. |
Parametric statistics | Stat procedures designed to be used with data that meet the assumption of normality and are quantitative. |
Proportions | Descriptive stats showing preventative of a value in terms of the whole. AKA “relative frequency”. |
Qualitative data | Values: categorical, nominal or ordinal scales. Data points counted, & converted to numerical values. |
Quartiles | The division of the distribution into four equal parts. |
Range | The difference between the highest and lowest values in ordered data set. |
Ratio data | Measurable observations w/ a natural order and equivalence. each observation is an equivalent distance from the next. There is a true zero point. You can divide one ratio value by another and achieve a meaningful result. This is why term ratio is used. |
Right skew | Visual description for a non-normal data set in which values trail off to the right side of the chart, creating a “tail” on that side. |
Sample size | Number of subjects who enter into a study. |
Sampling frame | Source from where subjects are drawn, (usually a list deemed to have sufficient number of potential subjects that can represent the population and allow for a representative, random sample to be drawn.) |
Scatter plot | Graph that shows relationships (AKA association/correlations) between variables. Closer the dots are in a scatter plot to forming a line, the stronger their association. |
Standard deviation | Measure of variability. Square root of the variance (average distance of all observations in a data set from the mean) of the data set. |
Statistics | The theory, study, and practice of quantitatively summarizing data. There are two general categories of statistics: descriptive and inferential. |
Survival curve | Graph used to show the length of time until a bad event, such as death. Can have one line or multiple lines. X-axis shows length of time until bad event. Y-axis shows percentage of group or groups. |
t-test | AKA “student’s t-test” or independent-samples t-test”. Parametric statistical test is used to determine if a significant difference exists between the means of two independent groups that have a similar sample size and degree of variation. |
Time-to-event curve | AKA “survival curve”. Indicates survival curve can be used to measure time until any type of even, not just bad event. |
Unimodal | A data set w/one mode (a single value that occurs most frequently). Unmorality is one of the required characteristics for a distribution to be considered normal). |
Variance (σ2) | Average distance of all observations from the mean in a normally shaped distribution (data set). |