click below

click below

Normal Size Small Size show me how

# Stat 1342 Ch 1-3 10

### Statistics

Question | Answer |
---|---|

Inferential Statistics is based on probability. True or False | True |

A _____________ consists of all subjects being studied. | Population |

______________________ is a decision-making process for evaluating claims about a population. | Hypothesis Testing |

Based on her electric bill from last year, Ms Smith expects her bill will be $75/mo this year. What type of statistics is this? | inferential |

A _____________ variable assumes values can be counted. | Discrete |

Quantitative data can be further classified as continuous or nonsequential. True or False | False Continuous or Discrete |

Rating a restaurant by stars is n example of an ordinal measurement. True or False | True |

Ordinal | Can be classified into categories that can be ranked. |

A person's hair color would be an example of quantitative measurement. True or False | False |

The variable of height is an example of quantitative variable. True or False | True |

The number of birds is in a tree is an example of continuous variables. True or False | False Discrete |

The four basic methods used to obtain samples are: Random, Irregular, Cluster and Stratified. True or False | False Random, Systemic, Cluster and Stratified |

Find Class Range | Subtract the lower class from the higher class |

Find Frequency | Count common values in a data set. |

Stratified Sample | Subgroups |

Cluster Sample | All subjects in a large Population |

independent variable | The independent variable is the variable in regression that can be controlled or manipulated. |

dependent variable | The dependent variable is the variable in regression that cannot be controlled or manipulated. |

scatter plot | A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of the independent variable x and the dependent variable y. |

Rounding Rule for the Correlation Coefficient | Round the value of r to three decimal places. |

What is the range of the linear correlation coefficient? | from −1 to +1. |

If there is a strong positive linear relationship between the variables, where will the value of r be? | the value of r will be close to +1. |

The independent and dependent variables can be plotted on a graph called a ________________________. | scatter plot |

If there is a strong negative linear relationship between the variables, where will the value of r be? | If there is a strong negative linear relationship between the variables, the value of r will be close to −1. |

Standard score or z score | Number of standard deviations that a data value is above or below the mean |

outlier | An outlier is an extremely high or an extremely low data value when compared with the rest of the data values. |

Find the interquartile range: | IQR = Q3 − Q1 |

Boxplot | graph of a data set obtained by drawing a horizontal line from the minimum data value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing a box whose vertical sides pass through Q1 and Q3 with a vertical line ins |

coefficient of variation | The coefficient of variation, denoted by CVar, is the standard deviation divided by the mean. The result is expressed as a percentage. |

range rule of thumb | The range can be used to approximate the standard deviation. A rough estimate of the standard deviation is x=range/4 |

Chebyshev’s theorem | This theorem states that at least three-fourths, or 75%, of the data values will fall within 2 standard deviations of the mean of the data set. |

empirical rule | when a distribution is bell-shaped (norm), approx 68% of the data values will fall within 1 standard deviation of the mean; approx 95% will fall within 2 standard deviations of the mean; and approx 99.7% will fall within 3 standard deviations of the mean |

A variable | is a characteristic or attribute that can assume different values. |

Data | are the values (measurements or observations) that the variables can assume. |

census | When data are collected from every subject in the population |

Qualitative variables | are variables that have distinct categories according to some characteristic or attribute. |

Quantitative variables | are variables that can be counted or measured. |

Quantitative variables can be further classified into two groups: | discrete and continuous |

Continuous variables | can assume an infinite number of values between any two specific values. They are obtained by measuring. They often include fractions and decimals. |

Discrete variables | assume values that can be counted. |

measurement scales, and four common types of scales are used: | nominal, ordinal, interval, and ratio. |

The ordinal | level of measurement that classifies data into categories that can be ranked; however, precise differences between the ranks do not exist. |

The interval | level of measurement that ranks data, and precise differences between units of measure do exist; however, there is no meaningful zero. |

The ratio | level of measurement that possesses all the characteristics of interval measurement, and there exists a true zero. In addition, true ratios exist when the same variable is measured on two different members of the population. |

A systematic | sample is a sample obtained by selecting every kth member of the population where k is a counting number. |

A random sample | is a sample in which all members of the population have an equal chance of being selected. |

Stratified Sampling | is a sample obtained by dividing the population into subgroups or strata according to some characteristic relevant to the study. Then subjects are selected at random from each subgroup. |

A cluster sample | is obtained by dividing the population into sections or clusters and then selecting one or more clusters at random and using all members in the cluster(s) as the members of the sample. |

observational study | the researcher merely observes what is happening or what has happened in the past and tries to draw conclusions based on these observations. |

experimental study | the researcher manipulates one of the variables and tries to determine how the manipulation influences other variables. |

confounding variable | is one that influences the dependent or outcome variable but was not separated from the independent variable. |

raw data | When the data are in original form |

A frequency distribution | is the organization of raw data in table form, using classes and frequencies. |

The categorical frequency distribution | is used for data that can be placed in specific categories, such as nominal- or ordinal-level data. |

grouped frequency distribution | When the range of the data is large, the data must be grouped into classes that are more than one unit in width |

the class width for a class in a frequency distribution is found by | subtracting the lower (or upper) class limit of one class from the lower (or upper) class limit of the next class. |

The class midpoint | Xm is obtained by adding the lower and upper boundaries and dividing by 2, or adding the lower and upper limits and dividing by 2 |

A cumulative frequency distribution | is a distribution that shows the number of data values less than or equal to a specific value (usually an upper boundary). The values are found by adding the frequencies of the classes less than or equal to the upper cl |

The three most commonly used graphs in research are | The histogram, The frequency polygon, The cumulative frequency graph, or ogive (pronounced o-jive). |

The histogram | is a graph that displays the data by using contiguous vertical bars (unless the frequency of a class is 0) of various heights to represent the frequencies of the classes. |

The frequency polygon | is a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. The frequencies are represented by the heights of the points. |

The ogive | is a graph that represents the cumulative frequencies for the classes in a frequency distribution |

A bar graph | represents the data by using vertical or horizontal bars whose heights or lengths represent the frequencies of the data. |

A Pareto chart | is used to represent a frequency distribution for a categorical variable, and the frequencies are displayed by the heights of vertical bars, which are arranged in order from highest to lowest. |

A time series graph | represents data that occur over a specific period of time. |

A pie graph | is a circle that is divided into sections or wedges according to the percentage of frequencies in each category of the distribution. |

A dotplot | is a statistical graph in which each data value is plotted as a point (dot) above the horizontal axis. |

A stem and leaf plot | is a data plot that uses part of the data value as the stem and part of the data value as the leaf to form groups or classes. |

A scatter plot | is a graph of order pairs of data values that is used to determine if a relationship exists between the two variables. |

A negative linear relationship | exists when the points fall approximately in a descending straight line from left to right. The relationship then is that as the x values are increasing, the y values are decreasing, or vice versa. |

A positive linear relationship | exists when the points fall approximately in an ascending straight line and both the x and y values increase at the same time . The relationship then is that as the values for the x variable increase, the |

A nonlinear relationship | exists when the points fall in a curved line. The relationship is described by the nature of the curve. |

A statistic | is a characteristic or measure obtained by using the data values from a sample. |

A parameter | is a characteristic or measure obtained by using all the data values from a specific population. |

General Rounding Rule | In statistics the basic rounding rule is that when computations are done in the calculation, rounding should not be done until the final answer is calculated. |

The mean | is the sum of the values, divided by the total number of values. |

The sample mean, | denoted by (pronounced “X bar”), is calculated by using sample data. The sample mean is a statistic. |

The population mean, | denoted by μ (pronounced “mew”), is calculated by using all the values in the population. The population mean is a parameter. |

Rounding Rule for the mean | The mean should be rounded to one more decimal place than occurs in the raw data. |

data array | When the data set is ordered |

The median | is the midpoint of the data array. The symbol for the median is MD. |

mode | The value that occurs most often in a data set. |

unimodal | A data set that has only one value that occurs with the greatest frequency |

bimodal | If a data set has two values that occur with the same greatest frequency, both values are considered to be the mode. |

multimodal | If a data set has more than two values that occur with the same greatest frequency, each value is used as the mode. |

The midrange | the sum of the lowest and highest values in the data set, divided by 2. The symbol MR is used for the midrange. |

weighted mean | is used when the values are not all equally represented. |

Find the weighted mean of a variable | X by multiplying each value by its corresponding weight and dividing the sum of the products by the sum of the weights. |

The range | is the highest value minus the lowest value. The symbol R is used for the range. |

The population variance is the average of | the squares of the distance each value is from the mean. |

The population standard deviation | is the square root of the variance. |

Rounding Rule for the Standard Deviation | The rounding rule for the standard deviation is the same as that for the mean. The final answer should be rounded to one more decimal place than that of the original data. |

A z score or standard score for a value is obtained by... | subtracting the mean from the value and dividing the result by the standard deviation. The symbol for a standard score is z. |

The z score represents ... | the number of standard deviations that a data value falls above or below the mean. |

Percentiles | divide the data set into 100 equal groups. |

Quartiles | divide the distribution into four equal groups, denoted by Q1, Q2, Q3. |

The interquartile range (IQR) | is the difference between the third and first quartiles. |