Question 1

What is "Scoring"? (Hint: used in marketing.)

Accepted Answer

The probability that a customer will behave in a certain way in the future

Question 2

Segmenting the population into groups that are homogeneous (similar) in order to construct a specific model for each segment is called: A) Stratification of models, B) Unsupervised segmentation, C) PCA

Accepted Answer

A) Stratification of models

Question 3

What is "unsupervised" segmentation?

Accepted Answer

Segmenting where general characteristics of the sample have no direct relationship with the dependent variable

Question 4

The Shapiro–Wilk test tests the ____ __________ so that any given sample comes from a normally distributed population.

Accepted Answer

null hypothesis

Question 5

True or false. The Kolmogorov-Smirnov test is a statistical hypothesis test.

Accepted Answer

True

Question 6

For variables that are qualitative or discrete, we test their link with the dependent variable by using  which of the following: A) Cramér's V, B) Student's T-test, C) Kruskal-Wallis test

Accepted Answer

Cramér's V

Question 7

The net present value of profitability of a customer is known as the ________ value.

Accepted Answer

lifetime

Question 8

True or false. The Anderson-Darling test is often used where a family of distributions is being tested.

Accepted Answer

True

Question 9

True or false. The Cramér's V test can be used in a bi-variate analysis.

Accepted Answer

True

Question 10

Select the best test for when you have 1 discrete variable and 1 continuous variable (e.g. dosage vs. recovery time): A) Cramér's V, B) ANOVA, C) Wilcoxan signed-rank

Accepted Answer

B) ANOVA

Question 11

A tool which is very suited to the detection of extreme values is the ____ ______.

Accepted Answer

box plot

Question 12

Statistical replacement of missing values uses a process called what?

Accepted Answer

Imputation

Question 13

What does the acronym MCMC stand for?

Accepted Answer

Markov chain Monte Carlo

Question 14

True or false. Aberrant values can often be detected using simple frequency tables.

Accepted Answer

True

Question 15

True or false. The normality of a variable can be verified by the Shapiro–Wilk test.

Accepted Answer

True

Question 16

True or false. The values of R^2 and χ^2 in the Kruskal–Wallis test increase with the strength of the link.

Accepted Answer

True

Question 17

True or false. Collinearity does not affect decision trees.

Accepted Answer

True

Question 18

True or false. Calculating the correlation coefficients of the variables in pairs is the simplest way of detecting collinearity?

Accepted Answer

True

Question 19

Predictive or Descriptive? Adverse events of a drug were explored by clustering the therapeutic classes.

Accepted Answer

Descriptive

Question 20

Predictive or Descriptive? A data analyst receives detailed customer purchasing data and a manager tasks the data analyst with finding associations of any type among customers.

Accepted Answer

Descriptive

Question 21

A data analyst is using analytical CRM to extract, store, analyze, and output relevant customer information. What is the first step within the analytical CRM phase that this analyst will be performing?

Accepted Answer

Combining a customer’s records to develop a holistic view

Question 22

True or false. The development phase cannot be completed in the absence of data.

Accepted Answer

True

Question 23

After performing a normality test on a dataset, results show the null hypothesis should be rejected. Which type of test should now be performed to analyze the data?

Accepted Answer

A non-parametric test

Question 24

What technique should be used to discover links between Age and Income?

Accepted Answer

Pearson correlation

Question 25

What type of data is used in psychographic data within commercial applications?

Accepted Answer

Lifestyle

Question 26

Name 2 types of data used in commercial applications?

Accepted Answer

Data on products and contracts

Question 27

What type of analysis is used to determine the distribution of purchases during a given period of time?

Accepted Answer

RFM (Recency, Frequency, Monetary)

Question 28

What is the term used when a data analyst needs to analyze the churn rate (customer retention) and the time of possible churn of customers for a local wireless company.

Accepted Answer

Survival analysis

Question 29

True or false. A decision tree algorithm is both a prediction model and a classification model.

Accepted Answer

True

Question 30

Which 2 methods should data analysts use to reduce processing time when working in SAS? A) Create Booleans as alphanumeric variables, B) Convert the data to a flat file, C) Place the analyzed file on a cloud network, D) Increase RAM

Accepted Answer

A & D

Question 31

A retailer is looking for patterns that will be used for marketing and has given a data analyst a large list of transactions with data on what customers purchased during each store visit.  Which mining method should the analyst use?

Accepted Answer

Association rules

Question 32

An automobile manufacturer has obtained access to customer-related data that was previously unavailable. Which method should the manufacturer use to perform descriptive data mining?

Accepted Answer

Factor analysis

Question 33

An analyst has been tasked by a pizza company to provide recommendations for three new restaurants. The best indication of success is based on the population of a surrounding area. Which mining method should be used to provide the recommendation?

Accepted Answer

Clustering

Question 34

When the single dependent variable is quantitative, and the single independent variable is qualitative, which data mining method should the data analyst be using?

Accepted Answer

Decision trees

Question 35

A data analyst needs to identify the most frequently cited words in the documentation and classify them into groups. Which method should the analyst use for classification?

Accepted Answer

Clustering

Question 36

A data analyst wants to reduce the dimensionality of the text from a set of web pages. Which method should the data analyst apply to the dataset?
A) Decision trees, 
B) Kohonen maps, 
C) ANOVA

Accepted Answer

B) Kohonen maps

Question 37

Understanding the expectations of customers and anticipating their needs is a major objective of which of the following? A) Data analysis, B) CRM, C) Inventory Management (IM)

Accepted Answer

CRM

Question 38

Looking at customer behavior and developing a descriptive profile in order to provide personalized marketing strategies for each group is known as:
A) Correspondence Analysis (CA), B) Linear regression, C) Customer Segmentation

Accepted Answer

Customer Segmentation

Question 39

True or false. The variation in the original data set is not maintained when data is discretized.

Accepted Answer

False. Variation IS maintained

Question 40

Which of the following sampling methods divides the population and draws individuals at random from each group? A) Stratified, B) Clustering, C) Systematic

Accepted Answer

A) Stratified

Question 41

Relational data is a type of which category of data? A) Product, B) Geodemographic, C) Customer

Accepted Answer

C) Customer

Question 42

Which of the following statistical techniques uses several variables, collectively, to predict one outcome variable? A) PCA, B) Multiple regression, C) RFM

Accepted Answer

C) Multiple regression

Question 43

Which of the following statistical technique allows you to transform your set of variables by using the variables with the highest variance? A) ANOVA, B) PCA, C) CA

Accepted Answer

B) PCA

Question 44

True or false. Principle Component Analysis (PCA) helps you identify which variables are important so you can compress the data by reducing the number of dimesions.

Accepted Answer

True

Question 45

True or false. Association Analysis allows you to determine the degree to which the items tend to be associated with one another.

Accepted Answer

True. For example, people who buy hamburger buns will also likely buy ketchup and mustard. You can associate items together and create rules.

Question 46

True or false. Logistic regression is used to explain the relationship between a dependent BINARY variable and one (or more) independent variables.

Accepted Answer

True

Question 47

True or false. Naive Bayes is an algorithm that uses a set of training data to construct a model that will classify new data points.

Accepted Answer

True

Question 48

Which of the following prediction techniques involves determining "weights" that describe the influence of the inputs on the target variable?
A) Association analysis, 
B) Linear regression, 
C) Support Vector Machine (SVM)

Accepted Answer

B) Linear regression

Question 49

How should the data be transformed prior to using the ANOVA test?

Accepted Answer

By taking the natural log

Question 50

A non-parametric tests doesn't require a ______ data distribution.

Accepted Answer

normal

Question 51

True or false. The Anderson-Darling test is a one-tailed test.

Accepted Answer

True

Question 52

True or false. Cramer's V measures frequency tables of categorical data types (2x2 or larger).

Accepted Answer

True

Question 53

Information collected by a company that measures the importance a consumer places on particular attributes of products or services is called: A) geodemographic data, B) bad data, C) attitudinal data

Accepted Answer

C) attitudinal data

Question 54

Winsorizing is the process of replacing an outlier's original value with the _________  value of an observation not seriously suspect.

Accepted Answer

nearest

Question 55

True or false. The lower the kurtosis, the smaller the  range of values.

Accepted Answer

True

Question 56

Which of the following is a non-parametric test? A) Mann-Whitney, B) Independent-Samples T-test, C) Paired-Samples T-test

Accepted Answer

A) Mann-Whitney

Question 57

True or false. Parametric statistics are used to make inferences about population parameters.

Accepted Answer

True

Question 58

True or false. Non-parametric statistics do not assume that the data or population have any characteristic structure.

Accepted Answer

True

Question 59

The Wilcoxon rank-sum test can be performed using ______ data.

Accepted Answer

ranked

Question 60

Which of the following 2 non-parametric tests are used in conjunction with Chi-square? A) McNemar, B) Wilcoxan signed-rank, C) Fisher's exact

Accepted Answer

McNemar & Fisher's exact

Question 61

True or false. An "r' value in Pearson's correlation ranges between 0 and 1.

Accepted Answer

False. It ranges between -1 and 1

Question 62

True or false. Cramér's V is a measure of association between two nominal variables, giving a value between 0 and 1.

Accepted Answer

True

Question 63

True or false. Correspondence analysis (CA) is a multivariate statistical technique. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data.

Accepted Answer

True

Question 64

True or false. A specific target is defined when analyzing supervised data.

Accepted Answer

True

WGU-743

Data Mining and Analytics

Question	Answer
This page has been left blank intentionally. Select the 'shuffle' option now.
What is "Scoring"? (Hint: used in marketing.)	The probability that a customer will behave in a certain way in the future
Segmenting the population into groups that are homogeneous (similar) in order to construct a specific model for each segment is called: A) Stratification of models, B) Unsupervised segmentation, C) PCA	A) Stratification of models
What is "unsupervised" segmentation?	Segmenting where general characteristics of the sample have no direct relationship with the dependent variable
The Shapiro–Wilk test tests the ____ __________ so that any given sample comes from a normally distributed population.	null hypothesis
True or false. The Kolmogorov-Smirnov test is a statistical hypothesis test.	True
For variables that are qualitative or discrete, we test their link with the dependent variable by using which of the following: A) Cramér's V, B) Student's T-test, C) Kruskal-Wallis test	Cramér's V
The net present value of profitability of a customer is known as the ________ value.	lifetime
True or false. The Anderson-Darling test is often used where a family of distributions is being tested.	True
True or false. The Cramér's V test can be used in a bi-variate analysis.	True
Select the best test for when you have 1 discrete variable and 1 continuous variable (e.g. dosage vs. recovery time): A) Cramér's V, B) ANOVA, C) Wilcoxan signed-rank	B) ANOVA
A tool which is very suited to the detection of extreme values is the ____ ______.	box plot
Statistical replacement of missing values uses a process called what?	Imputation
What does the acronym MCMC stand for?	Markov chain Monte Carlo
True or false. Aberrant values can often be detected using simple frequency tables.	True
True or false. The normality of a variable can be verified by the Shapiro–Wilk test.	True
True or false. The values of R^2 and χ^2 in the Kruskal–Wallis test increase with the strength of the link.	True
True or false. Collinearity does not affect decision trees.	True
True or false. Calculating the correlation coefficients of the variables in pairs is the simplest way of detecting collinearity?	True
Predictive or Descriptive? Adverse events of a drug were explored by clustering the therapeutic classes.	Descriptive
Predictive or Descriptive? A data analyst receives detailed customer purchasing data and a manager tasks the data analyst with finding associations of any type among customers.	Descriptive
A data analyst is using analytical CRM to extract, store, analyze, and output relevant customer information. What is the first step within the analytical CRM phase that this analyst will be performing?	Combining a customer’s records to develop a holistic view
True or false. The development phase cannot be completed in the absence of data.	True
After performing a normality test on a dataset, results show the null hypothesis should be rejected. Which type of test should now be performed to analyze the data?	A non-parametric test
What technique should be used to discover links between Age and Income?	Pearson correlation
What type of data is used in psychographic data within commercial applications?	Lifestyle
Name 2 types of data used in commercial applications?	Data on products and contracts
What type of analysis is used to determine the distribution of purchases during a given period of time?	RFM (Recency, Frequency, Monetary)
What is the term used when a data analyst needs to analyze the churn rate (customer retention) and the time of possible churn of customers for a local wireless company.	Survival analysis
True or false. A decision tree algorithm is both a prediction model and a classification model.	True
Which 2 methods should data analysts use to reduce processing time when working in SAS? A) Create Booleans as alphanumeric variables, B) Convert the data to a flat file, C) Place the analyzed file on a cloud network, D) Increase RAM	A & D
A retailer is looking for patterns that will be used for marketing and has given a data analyst a large list of transactions with data on what customers purchased during each store visit. Which mining method should the analyst use?	Association rules
An automobile manufacturer has obtained access to customer-related data that was previously unavailable. Which method should the manufacturer use to perform descriptive data mining?	Factor analysis
An analyst has been tasked by a pizza company to provide recommendations for three new restaurants. The best indication of success is based on the population of a surrounding area. Which mining method should be used to provide the recommendation?	Clustering
When the single dependent variable is quantitative, and the single independent variable is qualitative, which data mining method should the data analyst be using?	Decision trees
A data analyst needs to identify the most frequently cited words in the documentation and classify them into groups. Which method should the analyst use for classification?	Clustering
A data analyst wants to reduce the dimensionality of the text from a set of web pages. Which method should the data analyst apply to the dataset? A) Decision trees, B) Kohonen maps, C) ANOVA	B) Kohonen maps
Understanding the expectations of customers and anticipating their needs is a major objective of which of the following? A) Data analysis, B) CRM, C) Inventory Management (IM)	CRM
Looking at customer behavior and developing a descriptive profile in order to provide personalized marketing strategies for each group is known as: A) Correspondence Analysis (CA), B) Linear regression, C) Customer Segmentation	Customer Segmentation
True or false. The variation in the original data set is not maintained when data is discretized.	False. Variation IS maintained
Which of the following sampling methods divides the population and draws individuals at random from each group? A) Stratified, B) Clustering, C) Systematic	A) Stratified
Relational data is a type of which category of data? A) Product, B) Geodemographic, C) Customer	C) Customer
Which of the following statistical techniques uses several variables, collectively, to predict one outcome variable? A) PCA, B) Multiple regression, C) RFM	C) Multiple regression
Which of the following statistical technique allows you to transform your set of variables by using the variables with the highest variance? A) ANOVA, B) PCA, C) CA	B) PCA
True or false. Principle Component Analysis (PCA) helps you identify which variables are important so you can compress the data by reducing the number of dimesions.	True
True or false. Association Analysis allows you to determine the degree to which the items tend to be associated with one another.	True. For example, people who buy hamburger buns will also likely buy ketchup and mustard. You can associate items together and create rules.
True or false. Logistic regression is used to explain the relationship between a dependent BINARY variable and one (or more) independent variables.	True
True or false. Naive Bayes is an algorithm that uses a set of training data to construct a model that will classify new data points.	True
Which of the following prediction techniques involves determining "weights" that describe the influence of the inputs on the target variable? A) Association analysis, B) Linear regression, C) Support Vector Machine (SVM)	B) Linear regression
How should the data be transformed prior to using the ANOVA test?	By taking the natural log
A non-parametric tests doesn't require a ______ data distribution.	normal
True or false. The Anderson-Darling test is a one-tailed test.	True
True or false. Cramer's V measures frequency tables of categorical data types (2x2 or larger).	True
Information collected by a company that measures the importance a consumer places on particular attributes of products or services is called: A) geodemographic data, B) bad data, C) attitudinal data	C) attitudinal data
Winsorizing is the process of replacing an outlier's original value with the _________ value of an observation not seriously suspect.	nearest
True or false. The lower the kurtosis, the smaller the range of values.	True
Which of the following is a non-parametric test? A) Mann-Whitney, B) Independent-Samples T-test, C) Paired-Samples T-test	A) Mann-Whitney
True or false. Parametric statistics are used to make inferences about population parameters.	True
True or false. Non-parametric statistics do not assume that the data or population have any characteristic structure.	True
The Wilcoxon rank-sum test can be performed using ______ data.	ranked
Which of the following 2 non-parametric tests are used in conjunction with Chi-square? A) McNemar, B) Wilcoxan signed-rank, C) Fisher's exact	McNemar & Fisher's exact
True or false. An "r' value in Pearson's correlation ranges between 0 and 1.	False. It ranges between -1 and 1
True or false. Cramér's V is a measure of association between two nominal variables, giving a value between 0 and 1.	True
True or false. Correspondence analysis (CA) is a multivariate statistical technique. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data.	True
True or false. A specific target is defined when analyzing supervised data.	True
True or false: Factor loading is the correlation between the original data and the new principal component	True
Which of the following detects the two-way interactions between tables? A) Decision tree, B) Cluster analysis, C) Text mining	Decision tree
Which of the following identifies large volumes of data distilled into homogeneous groups? A) Decision tree, B) Neural networks, C) Cluster analysis, D) Text mining	Cluster analysis
Name the preferred test given the following: normality & homoscedasticity & 2 samples.	Student's t test
Name the preferred test given the following: normality & homoscedasticity & 3+ samples.	ANOVA
Name the preferred test given the following: normality & heteroscedasticity & 2 samples.	Welch's t test
Name the preferred test given the following: non-normality & heteroscedasticity & 2 samples.	Wilcoxon-Mann-Whitney
Name the preferred test given the following: non-normality & heteroscedasticity & 3+ samples.	Kruskal-Wallis
True or false. Principal component analysis (PCA) algorithms are ideal when working with continuous data.	True
True or false. Correspondence analysis (CA) algorithms are ideal when working with qualitative and binary variables.	True
Name the test: one sample, normal distribution.	One sample t-test
Name the test: one sample, non-normal distribution .	Wilcoxon rank sum test
Name the test: two samples, non-normal distribution, samples are NOT paired.	Mann-Whitney test
Name the test: three samples, normal distribution, samples are NOT paired .	Kruskal-Wallis test
Name the test: one independent continuous variable, testing for degree of relationship, normal distribution. (Hint: ____________ correlation.)	Pearson's correlation
True or false. Factor analysis is a statistical method used to describe variability among observed, correlated variables	True
Parametric statistics are used to make inferences about population ___________ .	parameters
True or false. In the field of statistics, we study samples because collecting data for an entire population is generally not feasible.	True
Of the following, which one is a parametric test: A) Mann-Whitney, B) One-way ANOVA, C) Kruskal-Wallis	B) One-way ANOVA
Name any 3 tests that test for homoscedasticity.	Levine, Bartlett, Fisher
Which of the following can be used to explore uni-variate data? A) Locating outliers, B) Graphs and tables, C) Statistical summaries	Statistical summaries (e.g. mean, median, mode, etc.)
An example of a test to use when you have 2 discrete variables is: A) Kruskal-Wallis, B) ANOVA, C) Chi-squared	C) Chi-squared
Name the best test when conducting a pre-test and a post-test from the same population. A) Chi-squared, B) Two-sample T-test, C) Levine	B) 2-sample T-test
True or false. A t-test is a statistic that checks if two means are reliably different from each other.	True
A(n) ___________ describes a characteristic of a population.	parameter
What is prevalence?	The total sub-population with a predefined condition within a population itself
The chi-squared test is a _______________ test. A) parametric, B) non-parametric	non-parametric
The Levene test assesses the _________ of variances for 2 or more groups.	equality
What is Correspondence Analysis (CA)	A multivariate statistical technique, similar to Principal Component Analysis but it applies to categorical data rather than continuous data.
True or false. Cramer's V is a way of calculating correlation in tables. It is used as pre-test to determine strengths of association after chi-square has determined significance.	False. It is a post-test
What is ANOVA?	Analysis of Variation
What is the paired t-test?	A statistical procedure used to determine whether the mean difference between two sets of observations is zero.
The Mann-Whitney test is used for __________ (rank) data; it is equivalent to the Independent T-test.	ordinal
The Wilcoxan-signed rank test is used for ordinal (rank) data; it is equivalent to the __________ T-test.	Paired
The McNemar test is used for __________ data; it is equivalent to the Paired T-test.	nominal (named)
True of false. PCA (Principle Component Analysis) is a "dimension increasing" method, converting the correlations among all of the variables into a 2-D graph.	False. It's a dimension decreasing method

"Know" box contains:
Time elapsed:
Retries: