# Statschap1-2

### vocab review

Question | Answer |
---|---|

science of collecting, organizing, summarizing, analyzing and making inferences from data | statistics |

includes collection, organization, summarizing, graphical displays | descriptive statistics |

includes making inferences, hypothesis testing, determining relationships, making predictions | inferential statistics |

the values or measurements that variables describing an event can assume | data |

values that are numeric | quantitative data |

data values that can be placed into distinct categories according to some characteristic or attribute | qualitative data |

assume values that can be counted | discrete variables |

variables that can assume all values between any two given values | continuous variables |

consists of all subjects that are being studied | population |

subset of a population | sample |

sample of an entire population | census |

characteristic or a fact of a population | parameter |

a characteristic or a fact of a sample | statistic |

an organization of raw data into tabular form using classes (or intervals) and frequencies | frequency distribution |

number of times the value occurs in the data set | frequency count |

represent data that can be placed in specific categories, such as gender, hair color, or religious affiliation | categorical frequency distributions |

simply lists the data values with the corresponding number of times or frequency count with which each value occurs | ungrouped frequency distribution |

obtained by dividing the frequency for that class by the total number of observations | relative frequency |

sum of the frequencies for all values at or below the given value | cumulative frequency |

sum of the relative frequencies for all values at or below the given value | cumulative relative frequency |

obtained by constructing classes (or intervals) for the data and then listing the corresponding number of values (frequency count) in each interval | grouped frequency distribution |

a plot that displays a dot for each value in a data set along a number line | dot plot |

a graph that uses vertical or horizontal bars to represent the frequencies of a category in a data set | bar chart/ bar gram |

a graphical display of a frequency or a relative frequency distribution that uses classes and vertical bars (rectangles) of various heights to represent the frequencies | histogram |

a graph that displays the data using lines to connect points plotted for the frequencies. The frequencies represent the heights of the vertical bars in the histogr | frequency polygon |

a data plot that uses part of a data value as the stem to form groups or classes and part of the data value as the leaf. | stem and leaf plot |

displays data that are observed over a given period of time | time-series graph |

a circle that is divided into slices according to the percentage of the data values in each category | pie chart |

of bar chart in which the horizontal axis represents categories of interest | pareto chart |

the average of the set of values | mean |

the numerical value in the middle when the data set is arranged in order | median |

the most frequently occurring value in the data set | mode |

most of the data values fall to the left of the mean, and the tail of the distribution is to the right. The mean is to the right of the median, and the mode is to the left of the median. | positively skewed |

most of the data values fall to the right of the mean, and the tail is to the left. Mean is to the left of the median, and the mode is to the right. | negatively skewed |

data values are evenly distributed on both sides of the mean. When the distribution is unimodal, the mean, median and mode are all equal to one another and are located at the center of the distribution. | symmetrical distribution |

the difference between the maximum and minimum data values, is affected by outliers | range |

the difference between the first and third quartiles. It kicks out the extremes-a nice feature for highly skewed data. | interquartile range |

the 'average' number of deviations from the mean | mean deviation |

almost the average of the squared deviations of the data from the mean | variance |

most commonly used statistical tool to monitor and control the quality of goods and services, such as consistency in delivery times; positive square root of the variance | standard deviation |

the relative amount of dispersion in a data set. Used to compare data that use different units. | coefficient of variation |

about 68% of the data is within 1SD of the mean, about 95% of the data is within 2SD of the mean, about 99.7% of the data is within 3SD of the mean | Empirical Rule |

the mean is less than the median is less than the mode | left-skewed distribution |

the mode is less than the median is less than the mean | right-skewed distribution |

Used to compare two or more data sets; tells us how many SD a specific value is above or below the mean value of the data set | z score |

numerical values that divide an ordered data set into 100 groups of values with a most 1 percent of the data values in each group | percentiles |

a graphical display that involves a five number summary of a distribution of values consisting of the minimum value, the lower quartile, the median, the upper quartile, and the maximum value | box plot |

the values of the dependent variable are along the vertical axis, and the values of the independent variable are along the horizontal axis | scatter plot |

a statistical relationship between two variables | correlation |

named data | nominal |

ordered data | ordinal |

interval | uniformly spaced values with no natural zero |

ratio | uniformly spaced values with a natural zero |

simple random sample (SRS) | ideal method (all members of a population have an equally likely chance to be represented in the sample--no intentional bias). |

convenience sample | gather data in the easiest way possible |

cluster sample | divide the population into clusters and randomly select from the clusters |

stratified sample | divide the population into at least two different strata each with a shared characteristic-gender, age group and then sample from these strata |

systematic samples | from some beginning data value we select every nth data value |

