# WGU-743

### Data Mining and Analytics

What is "Scoring"? (Hint: used in marketing.) The probability that a customer will behave in a certain way in the future
Segmenting the population into groups that are homogeneous (similar) in order to construct a specific model for each segment is called: A) Stratification of models, B) Unsupervised segmentation, C) PCA A) Stratification of models
What is "unsupervised" segmentation? Segmenting where general characteristics of the sample have no direct relationship with the dependent variable
The Shapiro–Wilk test tests the ____ __________ so that any given sample comes from a normally distributed population. null hypothesis
True or false. The Kolmogorov-Smirnov test is a statistical hypothesis test. True
For variables that are qualitative or discrete, we test their link with the dependent variable by using which of the following: A) Cramér's V, B) Student's T-test, C) Kruskal-Wallis test Cramér's V
The net present value of profitability of a customer is known as the ________ value. lifetime
True or false. The Anderson-Darling test is often used where a family of distributions is being tested. True
True or false. The Cramér's V test can be used in a bi-variate analysis. True
Select the best test for when you have 1 discrete variable and 1 continuous variable (e.g. dosage vs. recovery time): A) Cramér's V, B) ANOVA, C) Wilcoxan signed-rank B) ANOVA
A tool which is very suited to the detection of extreme values is the ____ ______. box plot
Statistical replacement of missing values uses a process called what? Imputation
What does the acronym MCMC stand for? Markov chain Monte Carlo
True or false. Aberrant values can often be detected using simple frequency tables. True
True or false. The normality of a variable can be verified by the Shapiro–Wilk test. True
True or false. The values of R^2 and χ^2 in the Kruskal–Wallis test increase with the strength of the link. True
True or false. Collinearity does not affect decision trees. True
True or false. Calculating the correlation coefficients of the variables in pairs is the simplest way of detecting collinearity? True
Predictive or Descriptive? Adverse events of a drug were explored by clustering the therapeutic classes. Descriptive
Predictive or Descriptive? A data analyst receives detailed customer purchasing data and a manager tasks the data analyst with finding associations of any type among customers. Descriptive
A data analyst is using analytical CRM to extract, store, analyze, and output relevant customer information. What is the first step within the analytical CRM phase that this analyst will be performing? Combining a customer’s records to develop a holistic view
True or false. The development phase cannot be completed in the absence of data. True
After performing a normality test on a dataset, results show the null hypothesis should be rejected. Which type of test should now be performed to analyze the data? A non-parametric test
What technique should be used to discover links between Age and Income? Pearson correlation
What type of data is used in psychographic data within commercial applications? Lifestyle
Name 2 types of data used in commercial applications? Data on products and contracts
What type of analysis is used to determine the distribution of purchases during a given period of time? RFM (Recency, Frequency, Monetary)
What is the term used when a data analyst needs to analyze the churn rate (customer retention) and the time of possible churn of customers for a local wireless company. Survival analysis
True or false. A decision tree algorithm is both a prediction model and a classification model. True
Which 2 methods should data analysts use to reduce processing time when working in SAS? A) Create Booleans as alphanumeric variables, B) Convert the data to a flat file, C) Place the analyzed file on a cloud network, D) Increase RAM A & D
A retailer is looking for patterns that will be used for marketing and has given a data analyst a large list of transactions with data on what customers purchased during each store visit. Which mining method should the analyst use? Association rules
An automobile manufacturer has obtained access to customer-related data that was previously unavailable. Which method should the manufacturer use to perform descriptive data mining? Factor analysis
An analyst has been tasked by a pizza company to provide recommendations for three new restaurants. The best indication of success is based on the population of a surrounding area. Which mining method should be used to provide the recommendation? Clustering
When the single dependent variable is quantitative, and the single independent variable is qualitative, which data mining method should the data analyst be using? Decision trees
A data analyst needs to identify the most frequently cited words in the documentation and classify them into groups. Which method should the analyst use for classification? Clustering
A data analyst wants to reduce the dimensionality of the text from a set of web pages. Which method should the data analyst apply to the dataset? A) Decision trees, B) Kohonen maps, C) ANOVA B) Kohonen maps
Understanding the expectations of customers and anticipating their needs is a major objective of which of the following? A) Data analysis, B) CRM, C) Inventory Management (IM) CRM
Looking at customer behavior and developing a descriptive profile in order to provide personalized marketing strategies for each group is known as: A) Correspondence Analysis (CA), B) Linear regression, C) Customer Segmentation Customer Segmentation
True or false. The variation in the original data set is not maintained when data is discretized. False. Variation IS maintained
Which of the following sampling methods divides the population and draws individuals at random from each group? A) Stratified, B) Clustering, C) Systematic A) Stratified
Relational data is a type of which category of data? A) Product, B) Geodemographic, C) Customer C) Customer
Which of the following statistical techniques uses several variables, collectively, to predict one outcome variable? A) PCA, B) Multiple regression, C) RFM C) Multiple regression
Which of the following statistical technique allows you to transform your set of variables by using the variables with the highest variance? A) ANOVA, B) PCA, C) CA B) PCA
True or false. Principle Component Analysis (PCA) helps you identify which variables are important so you can compress the data by reducing the number of dimesions. True
True or false. Association Analysis allows you to determine the degree to which the items tend to be associated with one another. True. For example, people who buy hamburger buns will also likely buy ketchup and mustard. You can associate items together and create rules.
True or false. Logistic regression is used to explain the relationship between a dependent BINARY variable and one (or more) independent variables. True
True or false. Naive Bayes is an algorithm that uses a set of training data to construct a model that will classify new data points. True
Which of the following prediction techniques involves determining "weights" that describe the influence of the inputs on the target variable? A) Association analysis, B) Linear regression, C) Support Vector Machine (SVM) B) Linear regression
How should the data be transformed prior to using the ANOVA test? By taking the natural log
A non-parametric tests doesn't require a ______ data distribution. normal
True or false. The Anderson-Darling test is a one-tailed test. True
True or false. Cramer's V measures frequency tables of categorical data types (2x2 or larger). True
Information collected by a company that measures the importance a consumer places on particular attributes of products or services is called: A) geodemographic data, B) bad data, C) attitudinal data C) attitudinal data
Winsorizing is the process of replacing an outlier's original value with the _________ value of an observation not seriously suspect. nearest
True or false. The lower the kurtosis, the smaller the range of values. True
Which of the following is a non-parametric test? A) Mann-Whitney, B) Independent-Samples T-test, C) Paired-Samples T-test A) Mann-Whitney
True or false. Parametric statistics are used to make inferences about population parameters. True
True or false. Non-parametric statistics do not assume that the data or population have any characteristic structure. True
The Wilcoxon rank-sum test can be performed using ______ data. ranked
Which of the following 2 non-parametric tests are used in conjunction with Chi-square? A) McNemar, B) Wilcoxan signed-rank, C) Fisher's exact McNemar & Fisher's exact
True or false. An "r' value in Pearson's correlation ranges between 0 and 1. False. It ranges between -1 and 1
True or false. Cramér's V is a measure of association between two nominal variables, giving a value between 0 and 1. True
True or false. Correspondence analysis (CA) is a multivariate statistical technique. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data. True
True or false. A specific target is defined when analyzing supervised data. True
True or false: Factor loading is the correlation between the original data and the new principal component True
Which of the following detects the two-way interactions between tables? A) Decision tree, B) Cluster analysis, C) Text mining Decision tree
Which of the following identifies large volumes of data distilled into homogeneous groups? A) Decision tree, B) Neural networks, C) Cluster analysis, D) Text mining Cluster analysis
Name the preferred test given the following: normality & homoscedasticity & 2 samples. Student's t test
Name the preferred test given the following: normality & homoscedasticity & 3+ samples. ANOVA
Name the preferred test given the following: normality & heteroscedasticity & 2 samples. Welch's t test
Name the preferred test given the following: non-normality & heteroscedasticity & 2 samples. Wilcoxon-Mann-Whitney
Name the preferred test given the following: non-normality & heteroscedasticity & 3+ samples. Kruskal-Wallis
True or false. Principal component analysis (PCA) algorithms are ideal when working with continuous data. True
True or false. Correspondence analysis (CA) algorithms are ideal when working with qualitative and binary variables. True
Name the test: one sample, normal distribution. One sample t-test
Name the test: one sample, non-normal distribution . Wilcoxon rank sum test
Name the test: two samples, non-normal distribution, samples are NOT paired. Mann-Whitney test
Name the test: three samples, normal distribution, samples are NOT paired . Kruskal-Wallis test
Name the test: one independent continuous variable, testing for degree of relationship, normal distribution. (Hint: ____________ correlation.) Pearson's correlation
True or false. Factor analysis is a statistical method used to describe variability among observed, correlated variables True
Parametric statistics are used to make inferences about population ___________ . parameters
True or false. In the field of statistics, we study samples because collecting data for an entire population is generally not feasible. True
Of the following, which one is a parametric test: A) Mann-Whitney, B) One-way ANOVA, C) Kruskal-Wallis B) One-way ANOVA
Name any 3 tests that test for homoscedasticity. Levine, Bartlett, Fisher
Which of the following can be used to explore uni-variate data? A) Locating outliers, B) Graphs and tables, C) Statistical summaries Statistical summaries (e.g. mean, median, mode, etc.)
An example of a test to use when you have 2 discrete variables is: A) Kruskal-Wallis, B) ANOVA, C) Chi-squared C) Chi-squared
Name the best test when conducting a pre-test and a post-test from the same population. A) Chi-squared, B) Two-sample T-test, C) Levine B) 2-sample T-test
True or false. A t-test is a statistic that checks if two means are reliably different from each other. True
A(n) ___________ describes a characteristic of a population. parameter
What is prevalence? The total sub-population with a predefined condition within a population itself
The chi-squared test is a _______________ test. A) parametric, B) non-parametric non-parametric
The Levene test assesses the _________ of variances for 2 or more groups. equality
What is Correspondence Analysis (CA) A multivariate statistical technique, similar to Principal Component Analysis but it applies to categorical data rather than continuous data.
True or false. Cramer's V is a way of calculating correlation in tables. It is used as pre-test to determine strengths of association after chi-square has determined significance. False. It is a post-test
What is ANOVA? Analysis of Variation
What is the paired t-test? A statistical procedure used to determine whether the mean difference between two sets of observations is zero.
The Mann-Whitney test is used for __________ (rank) data; it is equivalent to the Independent T-test. ordinal
The Wilcoxan-signed rank test is used for ordinal (rank) data; it is equivalent to the __________ T-test. Paired
The McNemar test is used for __________ data; it is equivalent to the Paired T-test. nominal (named)
True of false. PCA (Principle Component Analysis) is a "dimension increasing" method, converting the correlations among all of the variables into a 2-D graph. False. It's a dimension *decreasing* method
Created by: ronzStack