Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

Data Science

Data Science 435

TermDefinition
Clustering attempts to group individuals in a population together by their similarity, but not driven by any specific purpose eg “Do our customers form natural groups or segments?
“Classification and class probability estimation attempt to predict, for each individual in a population, which of a (small) set of classes this individual belongs to eg classifying emails into Spam or Legitimate.
“scoring or class probability estimation A scoring model applied to an individual produces, instead of a class prediction, a score representing the probability (or some other quantification of likelihood) that that individual belongs to each class.
“Regression (“value estimation”) attempts to estimate or predict, for each individual, the numerical value of some variable for that individual - used to find a function that models the data with the least error
“Similarity matching attempts to identify similar individuals based on data known about them. Similarity matching can be used directly to find similar entities
Co-occurrence grouping (aka frequent itemset mining, association rule learning, and market-basket analysis) attempts to find associations between entities based on transactions involving them. An example co-occurrence question would be: What items are commonly purchased together?
Profiling (aka behavior description or anomaly detection) attempts to characterize the typical behavior of an individual, group, or population. An example profiling question: “What is the typical cell phone usage of this customer segment?”-“often used to establish behavioral norms for anomaly detection
Link prediction attempts to predict connections between data items, usually by suggesting that a link should exist, and possibly also estimating the strength of the link
“Data reduction attempts to take a large set of data and replace it with a smaller set of data that contains much of the important information in the larger set
“Causal modeling attempts to help us understand what events or actions actually influence others
Data Science Set of fundamental principles that provide guidelines for extracting knowledge from data
Data Mining The extracting of knowledge from data using different technologies, processes, and algorithms
Data set A file that contains data arranged in a meaningful format
Database A repository of data that is arranged in a meaningful structure
DBMS A database management system is a system that provides the ability to perform different database operations
Data Warehouse A DB system that is equipped for performing analytical tasks; it stores historical data and contains current and master data
CRISP - DM Cross Industry Standard Process for Data Mining
CRISP-DM Phases Business understanding, Data understanding, Data preparation, Modeling, Evaluation, Deployment
Business understanding Understanding the end business goal that the data mining techniques should support
Data understanding Identifying the source of the data as well as any information necessary to to interpret the results
Data preparation The data should be prepared for data mining by ensuring that the data is high quality (entered properly, missing values handles strategically, etc.) and that it is capable of being processed by the desired data mining algorithm
Modeling Employs data mining algorithms to glean insights from the data; for example Classification models output the expected class of an object (eg responder or nonresponder)
Evaluation The output of the model is evaluated to see if the model is sound or if it might be improved
Deployment After the results have been validated, they can be safely deployed by returning to the goal set in the business understanding phase
Created by: Mindwatcher
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards