Question 1

Clustering

Accepted Answer

attempts to group individuals in a population together by their similarity, but not driven by any specific purpose eg “Do our customers form natural groups or segments?

Question 2

“Classification and class probability estimation

Accepted Answer

attempt to predict, for each individual in a population, which of a (small) set of classes this individual belongs to eg classifying emails into Spam or Legitimate.

Question 3

“scoring or class probability estimation

Accepted Answer

A scoring model applied to an individual produces, instead of a class prediction, a score representing the probability (or some other quantification of likelihood) that that individual belongs to each class.

Question 4

“Regression (“value estimation”)

Accepted Answer

attempts to estimate or predict, for each individual, the numerical value of some variable for that individual - used to find a function that models the data with the least error

Question 5

“Similarity matching

Accepted Answer

attempts to identify similar individuals based on data known about them. Similarity matching can be used directly to find similar entities

Question 6

Co-occurrence grouping (aka frequent itemset mining, association rule learning, and market-basket analysis)

Accepted Answer

attempts to find associations between entities based on transactions involving them. An example co-occurrence question would be: What items are commonly purchased together?

Question 7

Profiling (aka behavior description or anomaly detection)

Accepted Answer

attempts to characterize the typical behavior of an individual, group, or population. An example profiling question: “What is the typical cell phone usage of this customer segment?”-“often used to establish behavioral norms for anomaly detection

Question 8

Link prediction

Accepted Answer

attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of the link

Question 9

“Data reduction

Accepted Answer

attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set

Question 10

“Causal modeling

Accepted Answer

attempts to help us understand what events or actions actually influence others

Question 11

Data Science

Accepted Answer

Set of fundamental principles that provide guidelines for extracting knowledge from data

Question 12

Data Mining

Accepted Answer

The extracting of knowledge from data using different technologies, processes, and algorithms

Question 13

Data set

Accepted Answer

A file that contains data arranged in a meaningful format

Question 14

Database

Accepted Answer

A repository of data that is arranged in a meaningful structure

Question 15

DBMS

Accepted Answer

A database management system is a system that provides the ability  to perform different database operations

Question 16

Data Warehouse

Accepted Answer

A DB system that is equipped for performing analytical tasks; it stores historical data and contains current and master data

Question 17

CRISP - DM

Accepted Answer

Cross Industry Standard Process for Data Mining

Question 18

CRISP-DM Phases

Accepted Answer

Business understanding, Data understanding, Data preparation, Modeling, Evaluation, Deployment

Question 19

Business understanding

Accepted Answer

Understanding the end business goal that the data mining techniques should support

Question 20

Data understanding

Accepted Answer

Identifying the source of the data as well as any information necessary to to interpret the results

Question 21

Data preparation

Accepted Answer

The data should be prepared for data mining by ensuring that the data is high quality (entered properly, missing values handles strategically, etc.) and that it is capable of being processed by the desired data mining algorithm

Question 22

Modeling

Accepted Answer

Employs data mining algorithms to glean insights from the data; for example Classification models output the expected class of an object (eg responder or nonresponder)

Question 23

Evaluation

Accepted Answer

The output of the model is evaluated to see if the model is sound or if it might be improved

Question 24

Deployment

Accepted Answer

After the results have been validated, they can be safely deployed by returning to the goal set in the business understanding phase

"Know" box contains:
Time elapsed:
Retries:

Data Science

Data Science 435