Question 1

Name three measures of the central tendency of data.

Accepted Answer

Three measures of the central tendency of data are: mean, median, and mode.

Question 2

How do you calculate the arithmetic mean (mean) of a variable?

Accepted Answer

The arithmetic mean (mean) of a variable is computed by finding the sum of all the values of the variable in the data set and dividing the sum by the number of observations.

Question 3

What symbol is used for the mean of population data and what is the formula?

Accepted Answer

The symbol used is µ, and the formula is: µ = (Σxi)/N, where N is the size of the population.

Question 4

What symbol is used for the mean of sample data and what is the formula?

Accepted Answer

The symbol used is x-bar, and the formula is: x-bar = (Σxi)/n, where n is the size of the sample.

Question 5

What is the median of a variable data?

Accepted Answer

The median of a variable is the value that lies in the middle of the data when arranged in ascending order. We use M to represent the median

Question 6

How is the median of variable data determined?

Accepted Answer

First arrange the values in ascending order. If there is an odd number of data values, take the value in the middle. If there is an even number of values, take the average of the two middle values.

Question 7

What is a parameter?

Accepted Answer

A descriptive measure of a population, such as µ or σ.

Question 8

What is a statistic?

Accepted Answer

A descriptive measure of a sample, such as x-bar or "s".

Question 9

What is the mode of variable data?

Accepted Answer

The mode is the variable value that occurs most frequently in the data. The data may be bimodal, multimodal, or there may be no mode.

Question 10

Which of the three measures of central tendency is/are resistant to extreme values?

Accepted Answer

The median and mode are resistant to extreme values. The mean can be significantly influenced by extreme values.

Question 11

What does it mean for a numerical summary, such the median, to be resistant to extreme values?

Accepted Answer

A numerical summary of data is said to be resistant if extreme values (very large or small) relative to the data do not affect its value substantially.

Question 12

If we collect data and the mean is substantially smaller than the median, what is the likely shape of the distribution?

Accepted Answer

Skewed left

Question 13

If we collect data and the mean and the median are close in value, what is the likely shape of the distribution?

Accepted Answer

Symmetric

Question 14

If we collect data and the mean is substantially larger than the median, what is the likely shape of the distribution?

Accepted Answer

Skewed right

Question 15

Name four measures that describe the dispersion or spread of variable data.

Accepted Answer

Four measures that describe the dispersion or spread of variable data are: 1) Range, 2) Variance, 3) Standard Deviation, and 4) Interquartile Range.

Question 16

How is the Range of variable data defined?

Accepted Answer

The range, R, of a variable is the difference between the largest data value and the smallest data values. R = Largest Data Value – Smallest Data Value

Question 17

How is variance computed for population data?

Accepted Answer

The population variance of a variable is the sum of squared deviations about the population mean divided by the number of observations in the population, N.

Question 18

What symbol is used for population variance?

Accepted Answer

The population variance is symbolically represented by σ^2 (lower case Greek sigma squared)

Question 19

What is the formula for calculating population variance?

Accepted Answer

σ^2 = Σ (xi – µ)^2/N

Question 20

How is SAMPLE variance determined?

Accepted Answer

The sample variance is computed by finding the sum of squared deviations about the SAMPLE mean and then dividing this result by n – 1.

Question 21

What symbol is used for sample variance?

Accepted Answer

The sample variance is symbolically represented by s^2.

Question 22

Give the formula for calculating sample variance.

Accepted Answer

The formula for calculating sample variance is: s^2 = Σ (xi – x-bar)^2/(n – 1)

Question 23

How does the population variance compare to the population standard deviation?

Accepted Answer

The population standard deviation is obtained by taking the square root of the population variance.

Question 24

What symbol is used for population standard deviation?

Accepted Answer

The population standard deviation is denoted by the Greek letter σ.

Question 25

How does the sample standard deviation compare to the sample variance?

Accepted Answer

The sample standard deviation is obtained by taking the square root of the sample variance.

Question 26

What symbol is used to represent the sample standard deviation?

Accepted Answer

The sample standard deviation is denoted by the symbol "s".

Question 27

Give the formula used to compute sample standard deviation.

Accepted Answer

The formula used to compute sample standard deviation is; s = SQRT[Σ (xi – x-bar)^2/(n – 1)]

Question 28

According to the Empirical Rule, if a distribution is roughly bell-shaped, what percentage of the data values should fall within ± 1 standard deviation of the mean?

Accepted Answer

According to the Empirical Rule, if a distribution is roughly bell-shaped, approximately 68% of the data will lie within ± 1 standard deviation of the mean. That is, 68% of the data should fall between, µ – 1σ and µ + 1σ

Question 29

According to the Empirical Rule, if a distribution is roughly bell-shaped, what percentage of the data values should fall within ± 2 standard deviation of the mean?

Accepted Answer

According to the Empirical Rule, if a distribution is roughly bell-shaped, approximately 95% of the data will lie within ± 2 standard deviation of the mean. That is, 95% of the data should fall between, µ – 2σ and µ + 2σ.

Question 30

According to the Empirical Rule, if a distribution is roughly bell-shaped, what percentage of the data values should fall within ± 3 standard deviation of the mean?

Accepted Answer

According to the Empirical Rule, if a distribution is roughly bell-shaped, approximately 99.7% of the data will lie within ± 3 standard deviation of the mean. That is, 99.7% of the data should fall between, µ – 3σ and µ + 3σ.

Question 31

What is a z-score and what is it used for?

Accepted Answer

The distance that a value is from the mean in terms of the number of standard deviations. Z-scores are used to standardize data and to compare relative positions.

Question 32

How is a z-score calculated?

Accepted Answer

For population data, to convert a value, X, into a z-score, the formula is: Z = (X – µ)/σ. For sample data, to convert a value, X, into a z-score, the formula is: Z = (X – X-bar)/s.

Question 33

How is the kth percentile defined?

Accepted Answer

The kth percentile, denoted, Pk, of a set of data is a value such that k percent of the observations are less than or equal to the value.

Question 34

What are Quartiles?

Accepted Answer

Quartiles divide data sets into fourths, or four equal parts. For example, the 1st quartile, denoted Q1, divides the bottom 25% the data from the top 75%. Therefore, the 1st quartile is equivalent to the 25th percentile.

Question 35

What is the Interquartile Range?

Accepted Answer

The interquartile range, denoted IQR, is the range of the middle 50% of the observations in a data set. That is, the IQR is the difference between the first and third quartiles and is found using the formula: IQR = Q3 – Q1.

Question 36

How is the interquartile range used in identifying outliers?

Accepted Answer

Fences serve as cutoff points for determining outliers. The IQR is used to determine the upper and lower fences as follows: Lower fence = Q1 – 1.5(IQR) Upper fence = Q3 + 1.5(IQR)

Question 37

What is the "5-Number Summary"?

Accepted Answer

1. Minimum Value 2. Q1 3. Median 4. Q3 5. Maximum Value

Question 38

How is a boxplot used?

Accepted Answer

The boxplot is primarily used to identify possible outliers. It can also be used to determine whether a distribution is roughly symmetric, skewed left or skewed right.

Ch3_Summarizing Data

Nummerically Summarizing Data

"Know" box contains:
Time elapsed:
Retries: