Question 1

____ is a term used to describe information that derives from some form of measurement (counting, sorting, ranking, and so on).

Accepted Answer

Data

Question 2

True or false: the strength of the bootstrap method is that it can estimate the shape and spread for any distribution of interest.

Accepted Answer

True

Question 3

You can classify data as either quantitative or ___________.

Accepted Answer

Qualitative

Question 4

You can further classify quantitative data as discrete or _________.

Accepted Answer

Continuous

Question 5

True or false: recognizing quantitative data as discrete or continuous is another useful skill in statistics

Accepted Answer

True

Question 6

When a data set is at a __________ level, the data consists exclusively of categories (i.e. names, eye color, labels, gender, geographic location, etc.)

Accepted Answer

Nominal

Question 7

True or false: when the highest level of a data set is *ordinal level*, the data consists of categories that can be arranged in some meaningful order according to their relative size or quality

Accepted Answer

True

Question 8

A study is done to determine if the color of apples affects the purchase rate by consumers. Purchase counts for 3 colors of apples are collected.  Which technique should be used to determine if there is a difference in purchase rate by color?

Accepted Answer

ANOVA

Question 9

The original scores are compared with the second scores to determine whether there was decline.  What is the appropriate model to test the hypothesis? Hint: Two-______ t-test.

Accepted Answer

Sample

Question 10

When performing ANOVA on a data set, the error terms follow a _______ probability distribution.

Accepted Answer

Normal

Question 11

Machine learning frees humans and depends on ____________

Accepted Answer

Algorithms

Question 12

One of the objectives in Data Mining is to discover _________ within a given data set

Accepted Answer

Patterns

Question 13

The technical term for "data about data"

Accepted Answer

Metadata

Question 14

What does the "A" in API stand for?

Accepted Answer

Application

Question 15

_________ reveals features distinguishing one class of data objects from the other, leading to new discoveries

Accepted Answer

Clustering

Question 16

the difference between classification and clustering is that classification starts with ____________ labels while the labels are created after the fact for clustering.

Accepted Answer

Predefined

Question 17

Developing a reasonable understanding of ____________ is a must for data scientists.

Accepted Answer

Statistics

Question 18

True or false:  R^2 approaching 1 indicates a good fit for the model.

Accepted Answer

True

Question 19

The following formula is how the ______________ Sum of Squares is define: SSE=∑(Y − Y-hat)^2

Accepted Answer

Residual

Question 20

An ___________  Data Warehouse (EDW) is a specialized database that focuses on analyzing data.

Accepted Answer

Enterprise

Question 21

An organization is generally worked with "big data" when all for "Vs" are present: Volume, Variety, Velocity, and ____________

Accepted Answer

Veracity

Question 22

If you use a relational database, then you're going to be limited mostly to structured data. If you use a NoSQL cluster, then you'll be able to work with all types of data, but it will be more difficult to create __________.

Accepted Answer

Reports

Question 23

The degree that two things are related

Accepted Answer

Correlation

Question 24

True or false. With correlation coefficient, the closer you are to 1 or -1, the stronger the relationship

Accepted Answer

True

Question 25

Relative risk is the _____ of the risks of an outcome for two groups.

Accepted Answer

Ratio

Question 26

A researcher uses a chi-square test to find χ 2 = 6.78. Which conclusion can be drawn from this?

Accepted Answer

The χ 2 value does not allow conclusions, and a p-value must be calculated.

Question 27

An analyst was asked to analyze the test case of effectiveness of reducing a side effect, comparing the treatment versus placebo group on the result of having a side effect or not.  What is the proper method to use in this analysis?

Accepted Answer

Logistic regression

Question 28

A data analyst is performing logistic regression on elements with high multicollinearity.  What is an assumption of logistic regression that is being violated?

Accepted Answer

Variable independence

Question 29

What is the correct assumption for the logistic regression model?

Accepted Answer

The error terms for the variables are independent.

Question 30

True or false :z-Tests are commonly done when the population SD is known.

Accepted Answer

True

Question 31

True or false: t-Tests are commonly done when the population SD is unknown.

Accepted Answer

True

Question 32

True or false: 2 conditions for using a t-Test are: sigma is unknown, and n <30

Accepted Answer

True

Question 33

Participants asked to rate severity. Responses are rated on Likert-type scale (1-5). Randomly assigned outcomes not normally distributed. The ____-_____ method (non-parametric) test should be used.

Accepted Answer

rank-based

Question 34

True or false: the strength of the Bootstrap method is that it can estimate the shape and spread of any distribution of interest.

Accepted Answer

True

Question 35

Which test should be used when finding whether the outcome is success or failure where: a) the experiment consists of a fixed number of trials, b) each trial only has two possible outcomes, c) each trial is independent of any other trial?

Accepted Answer

Binomial test

Question 36

Apple color is studied to determine consumer purchase rate. Three colors are used and counts taken. Which technique is best?

Accepted Answer

ANOVA

Question 37

A pre-test and a post-test administered to determine if significant change is observed. Best test?

Accepted Answer

Two-sample t-test

Question 38

Which assumptions are made using a linear regression model?

Accepted Answer

Homoscedasticity; and B0, B1, and variance are constant

Question 39

If a researcher uses an ANOVA with a minimum sampling size of 30, what will need to be calculated?

Accepted Answer

Mean & Standard deviation

Question 40

True or false: if a multiple regression model needs to be constructed, dependent error terms violates the standard model assumptions.

Accepted Answer

True (you want them independent!)

Question 41

What are the 2 conditions that must be met in order to use a t-test?

Accepted Answer

1. Population standard  deviation must be unknown. 2 sample size must be < 30.

Question 42

What is the z-test formula?

Accepted Answer

z=x̄-μ/(s/sqrt of n)

Question 43

True or false: in an f-test, if R-squared approaches 1, that indicates a good fit for the model

Accepted Answer

True

Question 44

If the goal is to obtain an accurate p-value for the effectiveness of two medications, which method should be used? (Hint: "accurate")

Accepted Answer

Fisher's exact test

Question 45

Which test should be used if examining the relationship among 3 categories?

Accepted Answer

Chi-square

Question 46

When performing a hypothesis test for slope of a regression, which is the appropriate test to use?

Accepted Answer

One-sample t-test

Question 47

True or false: the Kruskal Wallis test is a rank-based non-parametric test to find statistical differences between two or more groups of an independent variable

Accepted Answer

True

Question 48

True or false: the f-test is a test designed to test if two population variances are equal

Accepted Answer

True

Question 49

True or false: the overall F-test considers all the regression parameters other than the intercept parameter, β0.

Accepted Answer

True

Question 50

True or false: the One-way t-test is a test used to estimate the  mean of a sample population

Accepted Answer

True

Question 51

A Probability Mass Function (pmf) utilizes which type of variable?

Accepted Answer

a discrete random variable

Question 52

True or false: Akaike Information Criterion (AIC) is a measure commonly used to compare models with a different number of parameters.

Accepted Answer

True

Question 53

True or false: standard deviation is the square root of the variance.

Accepted Answer

True

Question 54

True or false: the Hypergeometric distribution models the number of events in a certain number of trials (draws). Once selected, an item is no longer available for selection (known as "draws without replacement".

Accepted Answer

True

Question 55

True or false: the Beta distribution models probability for proportions.

Accepted Answer

True

Question 56

The Uniform distribution models possible values which fall within a range of _____ probabilities.

Accepted Answer

equal

Question 57

True or false: the sampling distribution of the mean is the distribution of sample means when taking random samples of the same size.

Accepted Answer

True

Question 58

True or false: the t-test is a method for hypothesis testing for the mean of a sample taken from a normally distributed population when the population standard deviation is known.

Accepted Answer

False (unknown)

Question 59

True or false: *bootstrapping* is a resampling technique that selects a set of data points, allowing the same data point to potentially be selected more than once.

Accepted Answer

True

Question 60

The Wilcoxon rank-sum test can be performed using ______ data.

Accepted Answer

ranked

Question 61

The Kruskal-Wallis test is another rank-based statistical test. It is a ______________ equivalent to a one-way analysis of variance,.

Accepted Answer

non-parametric

Question 62

Fisher's exact test is a method of calculating the exact _______ for contingency tables

Accepted Answer

p-value

Question 63

Computing probabilities for all possible tables is very time-consuming, so an approximation, the ____________ test, is sometimes used when the sample size is large.

Accepted Answer

chi-squared

Question 64

True or false: a Principal Component Analysis (PCA) transforms variables into possibly more components to simplify subsequent analyses.

Accepted Answer

False. "fewer"

WGU-741

Fundamentals of Data Analysis

Question	Answer
This page has been left blank intentionally. Select the 'shuffle' option now.
____ is a term used to describe information that derives from some form of measurement (counting, sorting, ranking, and so on).	Data
True or false: the strength of the bootstrap method is that it can estimate the shape and spread for any distribution of interest.	True
You can classify data as either quantitative or ___________.	Qualitative
You can further classify quantitative data as discrete or _________.	Continuous
True or false: recognizing quantitative data as discrete or continuous is another useful skill in statistics	True
When a data set is at a __________ level, the data consists exclusively of categories (i.e. names, eye color, labels, gender, geographic location, etc.)	Nominal
True or false: when the highest level of a data set is ordinal level, the data consists of categories that can be arranged in some meaningful order according to their relative size or quality	True
A study is done to determine if the color of apples affects the purchase rate by consumers. Purchase counts for 3 colors of apples are collected. Which technique should be used to determine if there is a difference in purchase rate by color?	ANOVA
The original scores are compared with the second scores to determine whether there was decline. What is the appropriate model to test the hypothesis? Hint: Two-______ t-test.	Sample
When performing ANOVA on a data set, the error terms follow a _______ probability distribution.	Normal
Machine learning frees humans and depends on ____________	Algorithms
One of the objectives in Data Mining is to discover _________ within a given data set	Patterns
The technical term for "data about data"	Metadata
What does the "A" in API stand for?	Application
_________ reveals features distinguishing one class of data objects from the other, leading to new discoveries	Clustering
the difference between classification and clustering is that classification starts with ____________ labels while the labels are created after the fact for clustering.	Predefined
Developing a reasonable understanding of ____________ is a must for data scientists.	Statistics
True or false: R^2 approaching 1 indicates a good fit for the model.	True
The following formula is how the ______________ Sum of Squares is define: SSE=∑(Y − Y-hat)^2	Residual
An ___________ Data Warehouse (EDW) is a specialized database that focuses on analyzing data.	Enterprise
An organization is generally worked with "big data" when all for "Vs" are present: Volume, Variety, Velocity, and ____________	Veracity
If you use a relational database, then you're going to be limited mostly to structured data. If you use a NoSQL cluster, then you'll be able to work with all types of data, but it will be more difficult to create __________.	Reports
The degree that two things are related	Correlation
True or false. With correlation coefficient, the closer you are to 1 or -1, the stronger the relationship	True
Relative risk is the _____ of the risks of an outcome for two groups.	Ratio
A researcher uses a chi-square test to find χ 2 = 6.78. Which conclusion can be drawn from this?	The χ 2 value does not allow conclusions, and a p-value must be calculated.
An analyst was asked to analyze the test case of effectiveness of reducing a side effect, comparing the treatment versus placebo group on the result of having a side effect or not. What is the proper method to use in this analysis?	Logistic regression
A data analyst is performing logistic regression on elements with high multicollinearity. What is an assumption of logistic regression that is being violated?	Variable independence
What is the correct assumption for the logistic regression model?	The error terms for the variables are independent.
True or false :z-Tests are commonly done when the population SD is known.	True
True or false: t-Tests are commonly done when the population SD is unknown.	True
True or false: 2 conditions for using a t-Test are: sigma is unknown, and n <30	True
Participants asked to rate severity. Responses are rated on Likert-type scale (1-5). Randomly assigned outcomes not normally distributed. The ____-_____ method (non-parametric) test should be used.	rank-based
True or false: the strength of the Bootstrap method is that it can estimate the shape and spread of any distribution of interest.	True
Which test should be used when finding whether the outcome is success or failure where: a) the experiment consists of a fixed number of trials, b) each trial only has two possible outcomes, c) each trial is independent of any other trial?	Binomial test
Apple color is studied to determine consumer purchase rate. Three colors are used and counts taken. Which technique is best?	ANOVA
A pre-test and a post-test administered to determine if significant change is observed. Best test?	Two-sample t-test
Which assumptions are made using a linear regression model?	Homoscedasticity; and B0, B1, and variance are constant
If a researcher uses an ANOVA with a minimum sampling size of 30, what will need to be calculated?	Mean & Standard deviation
True or false: if a multiple regression model needs to be constructed, dependent error terms violates the standard model assumptions.	True (you want them independent!)
What are the 2 conditions that must be met in order to use a t-test?	1. Population standard deviation must be unknown. 2 sample size must be < 30.
What is the z-test formula?	z=x̄-μ/(s/sqrt of n)
True or false: in an f-test, if R-squared approaches 1, that indicates a good fit for the model	True
If the goal is to obtain an accurate p-value for the effectiveness of two medications, which method should be used? (Hint: "accurate")	Fisher's exact test
Which test should be used if examining the relationship among 3 categories?	Chi-square
When performing a hypothesis test for slope of a regression, which is the appropriate test to use?	One-sample t-test
True or false: the Kruskal Wallis test is a rank-based non-parametric test to find statistical differences between two or more groups of an independent variable	True
True or false: the f-test is a test designed to test if two population variances are equal	True
True or false: the overall F-test considers all the regression parameters other than the intercept parameter, β0.	True
True or false: the One-way t-test is a test used to estimate the mean of a sample population	True
A Probability Mass Function (pmf) utilizes which type of variable?	a discrete random variable
True or false: Akaike Information Criterion (AIC) is a measure commonly used to compare models with a different number of parameters.	True
True or false: standard deviation is the square root of the variance.	True
True or false: the Hypergeometric distribution models the number of events in a certain number of trials (draws). Once selected, an item is no longer available for selection (known as "draws without replacement".	True
True or false: the Beta distribution models probability for proportions.	True
The Uniform distribution models possible values which fall within a range of _____ probabilities.	equal
True or false: the sampling distribution of the mean is the distribution of sample means when taking random samples of the same size.	True
True or false: the t-test is a method for hypothesis testing for the mean of a sample taken from a normally distributed population when the population standard deviation is known.	False (unknown)
True or false: bootstrapping is a resampling technique that selects a set of data points, allowing the same data point to potentially be selected more than once.	True
The Wilcoxon rank-sum test can be performed using ______ data.	ranked
The Kruskal-Wallis test is another rank-based statistical test. It is a ______________ equivalent to a one-way analysis of variance,.	non-parametric
Fisher's exact test is a method of calculating the exact _______ for contingency tables	p-value
Computing probabilities for all possible tables is very time-consuming, so an approximation, the ____________ test, is sometimes used when the sample size is large.	chi-squared
True or false: a Principal Component Analysis (PCA) transforms variables into possibly more components to simplify subsequent analyses.	False. "fewer"
A standardized variable is a variable that has been adjusted to have a mean of zero and a standard deviation of ______.	one
A principal component loading plot displays the ______ of each input variable between a pair of principal components.	weight
True or false: Factor loading is the correlation between the original data and the new principal component	True
If time dependence exists in the errors, the errors are said to be ______________.	autocorrelated
The coefficient of determination, denoted R^2, measures the proportion of total variation in the response variable, Y, that is accounted for by the linear regression model.	True
The category represented by the omitted indicator variable is the __________ ______ of the categorical predictor.	reference level
True or false: an interaction item is two predictor variables multiplied together and used as an additional predictor term in a multiple linear regression model	True
True or false: a Loess smooth curve is an estimate of the data trend obtained by predicting each data point using the points nearby.	True
_____________ is the percentage of correctly classified observations with the desired outcome.	Sensitivity
_____________ is the percentage of correctly classified observations WITHOUT the desired outcome.	Specificity
True or false: when there is an even number of terms, the median is the mean of the middle two terms.	True
A _______________ distribution has a peak of high-frequency data on the right with a tail of low-frequency data on the left side.	left-skewed
Rolling less than a 3 on a six-sided die is best described as _____. a) a simple event b) a compound event	b) a compound event
The Poisson distribution models the number of _______ that occurs in an interval; counts. Ex: The number of emails received each day.	events
_______ is a statistical method used to compare the means of two or more groups.	ANOVA
True or false: a t-test is a statistic that checks if two means are reliably different from each other.	True
The concept of __________ ____________formalizes the intuition of making inferences about populations from observed sample statistics.	statistical significance
If the p-value is less than a threshold known as the ______________ (written as alpha), then one rejects the null hypothesis in favor of the alternative hypothesis.	significance level
True of false: the f-distribution is a non-symmetric, continuous, probability distribution that is used to perform inference on two population means	False (variances)
True or false: the f-distribution is right-skewed and unimodal	True
True or false: the Probability Mass Function can be defined as P(X=x)	True
A _________________ describes characteristics of a population.	parameter
True or false: a chi-square test compares an expected data set with an observed data set.	True
A Logistic Regression Model is one such that the dependent variable is ____________.	categorical
True or false: the t-distribution is used in place of the normal distribution in situations where the sample size (n) is small (<30) or the population standard deviation (σ) is unknown.	True
Why must the ANOVA method be used on 3 or more groups?	Because there are variations within each group (columns) AND variations between each group (rows)
Name the best test when conducting a Pre-test and Post-test from the same population?	Two-sample T-test
True of false: the one-way t-test is used to estimate the mean of a sample population	True
When conducting an experiment that involves a treatment group vs. a control group, why type of regression applies?	Logistic regression
Which test would you use when the relative data is found in a contingency table?	Fisher's exact test
Fisher's exact test uses the __________________ distribution to calculate the probability that the observed contingency table (or one even more unusual) would occur under the null hypothesis.	hypergeometric
Twenty-five adults between 65 and 80 are selected to eat blackberries for six months and be retested. The original scores are compared with the second scores to determine whether there was decline.	Two-sample t-test
A random sample of 30 students from Major One and 30 students from Major Two are selected to compare GPAs. The researcher decides to use an ANOVA test for this study. What will the researcher need to calculate?	Group means and standard deviations
What is data mining and business analytics used for?	Collecting and analyzing data for better decision making with the goal of solving business-related problems.
An example of a test to use when you have 2 discrete variables is: A) Kruskal-Wallis, B) ANOVA, C) Chi-squared	Chi-squared
True or false. In the field of statistics, we study samples because collecting data for an entire population is generally not feasible.	True

"Know" box contains:
Time elapsed:
Retries: