Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

MIS 665 Final2SS

Terms from Summary Sheets

TermDefinition
_More_ or _Less_ features with similar R^2 "wins?" Less
What do we fit data on? Train data (NEVER test)
What do we check before modeling to catch multicollinearity Correlation Heatmap
Is a lower or higher MSE better? Lower
Higher R^2 means more variance explained
StnadardScaler changes ___, NOT ___ coef scale, model accuracy
Parsimony fewer features with similar R^2 is preferred
Lasso (alpha=1) automatic feature selection via L1 penalty
DATA LEAKAGE never fit_transform on full X --> Split first!
KNN requires ___ StandardScaler (uses distance)
Logistic Regression gives ____, not just class labels probabilities
SelectKBest: fit on training data only leakage rule
0.5 is __ guess, 1.0 is ___ guess random, perfect
Use _____ for reliable accuracy cross_val_score(cv=10)
what avoids dummy trap in OneHotEncoder drop='first'
ALWAYS do what before clustering standardize
______ features dominate distance high-variance
Clustering is supervised/unsupervised unsupervised (NO Y)
Silhouette score of ___+ is strong, ____ is reasonable 0.71+, 0.51-0.70
K-Means++ init reduces sensitivity to random start
Profile clusters on _________ data for meaning original (unscaled)
PCA is ____-based, MUST scale first variance
Fit PCA on ___ data only, transform both separately training
n_components=0/90 auto-selects fewest PCs for 90% variance
PCA replaces feature names use loadings to interpret
Embeddings understand ___ synonyms
Pipeline: represent text > reduce dims > build classifier
A silhouette score of 0.50 generally indicates moderately well-separated clusters
In healthcare, which metric is most important? Recall (measure of how many actual positives were caught)
Created by: lexi.welte
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards