click below
click below
Normal Size Small Size show me how
Bus Stats Exam 1
Business Stats Ch 1-3
| Term | Definition |
|---|---|
| Analytics | The scientific process of transforming data into insight for making better decisions. |
| Big Data | A set of data that cannot be managed, processed, or analyzed with commonly available software in a reasonable amount of time. Big data are characterized by great vol- ume |
| Categorical Data | Labels or names used to identify an attribute of each element. Categorical data use either the nominal or ordinal scale of measurement and may be nonnumeric or numeric. |
| Census | A survey to collect data on the entire population |
| Cross-Sectional Data | Data collected at the same or approximately the same point in time |
| Data | The facts and figures collected, analyzed, and summarized for presentation and interpretation |
| Data Mining | The process of using procedures from statistics and computer science to extract useful information from extremely large databases. |
| Data Set | All the data collected in a particular study |
| Descriptive Analytics | The set of analytical techniques that describe what has happened in the past |
| Descriptive Statistics | Tabular, graphical, and numerical summaries of data |
| Elements | The entities on which data are collected |
| Interval Scale | The scale of measurement for a variable if the data demonstrate the proper- ties of ordinal data and the interval between values is expressed in terms of a fixed unit of measure. Interval data are always numeric |
| Nominal Scale | The scale of measurement for a variable when the data are labels or names used to identify an attribute of an element. Nominal data may be nonnumeric or numeric |
| Observation | The set of measurements obtained for a particular element |
| Ordinal Scale | The scale of measurement for a variable if the data exhibit the properties of nominal data and the order or rank of the data is meaningful. Ordinal data may be non- numeric or numeric |
| Population | The set of all elements of interest in a particular study |
| Predictive Analytics | The set of analytical techniques that use models constructed from past data to predict the future or assess the impact of one variable on another |
| Prescriptive Analytics | The set of analytical techniques that yield a best course of action |
| Quantitative Data | Numeric values that indicate how much or how many of something. Quantitative data are obtained using either the interval or ratio scale of measurement |
| Quantitative Variable | A variable with quantitative data |
| Ratio Scale | The scale of measurement for a variable if the data demonstrate all the properties of interval data and the ratio of two values is meaningful. Ratio data are always numeric |
| Sample | A subset of the population |
| Sample Survey | A survey to collect data on a sample |
| Statistical Inference | The process of using data obtained from a sample to make estimates or test hypotheses about the characteristics of a population |
| Statistics | The art and science of collecting, analyzing, presenting, and interpreting data |
| Time Series Data | Data collected over several time periods |
| Variable | A characteristic of interest for the elements |
| Bar Chart | A graphical device for depicting categorical data that have been summarized in a frequency, relative frequency, or percent frequency distribution |
| Categorical Data | Labels or names used to identify categories of like items |
| Class Midpoint | The value halfway between the lower and upper class limits |
| Crosstabulation | A tabular summary of data for two variables. The classes for one variable are represented by the rows; the classes for the other variable are represented by the columns |
| Cumulative Frequency Distribution | A tabular summary of quantitative data showing the number of data values that are less than or equal to the upper class limit of each class. |
| Cumulative Percent Frequency Distribution | A tabular summary of quantitative data showing the percentage of data values that are less than or equal to the upper class limit of each class |
| Cumulative Relative Frequency Distribution | A tabular summary of quantitative data showing the fraction or proportion of data values that are less than or equal to the upper class limit of each class |
| Data Dashboard | A set of visual displays that organizes and presents information that is used to monitor the performance of a company or organization in a manner that is easy to read, understand, and interpret |
| Data Visualization | A term used to describe the use of graphical displays to summarize and present information about a data set |
| Dot Plot | A graphical device that summarizes data by the number of dots above each data value on the horizontal axis |
| Frequency Distribution | A tabular summary of data showing the number (frequency) of observations in each of several nonoverlapping categories or classes |
| Histogram | A graphical display of a frequency distribution |
| Percent Frequency Distribution | A tabular summary of data showing the percentage of observations in each of several nonoverlapping classes |
| Pie Chart | A graphical device for presenting data summaries based on subdivision of a circle into sectors that correspond to the relative frequency for each class |
| Relative Frequency Distribution | A tabular summary of data showing the fraction or pro- portion of observations in each of several nonoverlapping categories or classes |
| Scatter Diagram | A graphical display of the relationship between two quantitative variables. One variable is shown on the horizontal axis and the other variable is shown on the vertical axis |
| Side-by-Side Bar Chart | A graphical display for depicting multiple bar charts on the same display |
| Simpson's Paradox | Conclusions drawn from two or more separate crosstabulations that can be reversed when the data are aggregated into a single crosstabulation |
| Stacked Bar Chart | A bar chart in which each bar is broken into rectangular segments of a different color showing the relative frequency of each class in a manner similar to a pie chart |
| Stem-and-Leaf Display | A graphical display used to show simultaneously the rank order and shape of a distribution of data |
| Trend Line | A line that provides an approximation of the relationship between two variables |
| Box Plot | A graphical summary of data based on a five-number summary |
| Chebyshev's Theorem | A theorem that can be used to make statements about the pro- portion of data values that must be within a specified number of standard deviations of the mean |
| Coefficient of Variation | A measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100 |
| Correlation Coefficient | A measure of linear association between two variables that takes on values between −1 and +1. |
| Covariance | A measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship. |
| Empirical Rule | A rule that can be used to compute the percentage of data values that must be within one, two, and three standard deviations of the mean for data that exhibit a bell-shaped distribution |
| Five-Number Summary | A technique that uses five numbers to summarize the data: smallest value, first quartile, median, third quartile, and largest value |
| Geometric Mean | A measure of location that is calculated by finding the nth root of the product of n values |
| Interquartile Range | A measure of variability, defined to be the difference between the third and first quartiles. |
| Mean | A measure of central location computed by summing the data values and dividing by the number of observations. |
| Median | A measure of central location provided by the value in the middle when the data are arranged in ascending order |
| Mode | A measure of location, defined as the value that occurs with greatest frequency |
| Outlier | An unusually small or unusually large data value |
| Percentile | A value such that at least p percent of the observations are less than or equal to this value and at least (100 − p) percent of the observations are greater than or equal to this value |
| Point Estimator | A sample statistic, such as x, s2, and s, used to estimate the corresponding population parameter |
| Population Parameter | A numerical value used as a summary measure for a population |
| Quartiles | The 25th, 50th, and 75th percentiles, referred to as the first quartile, the second quartile (median), and third quartile, respectively. The quartiles can be used to divide a data set into four parts |
| Range | A measure of variability, defined to be the largest value minus the smallest value. |
| Sample Statistic | A numerical value used as a summary measure for a sample |
| Skewness | A measure of the shape of a data distribution. Data skewed to the left result in negative skewness; a symmetric data distribution results in zero skewness; and data skewed to the right result in positive skewness |
| Standard Deviation | A measure of variability computed by taking the positive square root of the variance |
| Variance | A measure of variability based on the squared deviations of the data values about the mean. |
| Weighted Average | The mean obtained by assigning each observation a weight that reflects its importance |
| Z-Score | A value computed by dividing the deviation about the mean (xi − x) by the standard deviation s. A z-score is referred to as a standardized value and denotes the number of standard deviations xi is from the mean |