Save
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

Exam 1 Part 7

Data Similarities and Distances and KNN

QuestionAnswer
What is similarity? numerical measure of how alike two data objects are
What is dissimilarity? numerical measure of how different two data objects are
What is proximity? refers to a similarity or dissimilarity
How is a data matrix built? n data points by p dimensions with two modes
How is a dissimilarity matrix built? n data points, but registers only the distance. is a triangular matrix with a single mode
What is qualitative data? categorical with nominal and ordinal attributes
What is quantitative data? numerical with discrete and continuous attributes
Why is mean absolute deviation better at handling outliers than standard deviation? MAD uses the average distances of the data points from the mean while SD squares the difference between each data point and the mean. Because the standard deviation squares the differences, outliers have a larger impact on it than on MAD.
Why is logarithmic transformation useful? makes highly skewed distributions less skewed. This can be valuable both for making patterns in the data more interpretable and for helping to meet the assumptions of inferential statistics.
What is a norm? norm is a function that accepts as input a vector from our vector space and spits out a real number that tells us how big that vector is.
What is cosine distance? What cases is it useful? cosine similarity is a measure of similarity between two non-zero vectors defined in an inner product space. often used to measure document similarity in text analysis.
What is KNN, what type of learning happens? nonparametric supervised
What are the basic requirements for KNN? An integer K, a set of labeled examples (training data), a metric to measure closeness
What are the pros of using KNN? analytically tractable, simple implementation, nearly optimal with large sample size, uses local info which yields highly adaptive behavior, parallel implementations
What are the cons of using KNN? has large storage requirements, computationally intensive, highly susceptible to curse of dimensionality
How to find k in KNN? use cross-validation, let k < sqrt(n) where n is the number of training examples. usually choose large value of k for better performance.
How to handle attributes with large range in KNN? normalize scale
How to handle correlated attributes in KNN? eliminate some attributes or vary and possibly adapt the weight of attributes
How to handle symbols in KNN? use hamming distance
How to handle KNN expensive in testing? use subset of dimensions, pre-sort training examples into fast data structures, compute only an approximate distance, remove redundant data
How to handle KNN storage requirements? remove redundant data, note that pre-sorting increases storage requirement
How to handle KNN curse of dimensionality? increase amount of data
Created by: amhhh
Popular Computers sets

 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards