click below
click below
Normal Size Small Size show me how
MIS 665 Final
| Question | Answer |
|---|---|
| the role of machine learning within the broader AI hierarchy | Machine learning is a subset of AI that learns patterns from data |
| Example of unsupervised learning | Grouping customers into segments based on purchasing behavior (no predicting) |
| R-Squared (coefficient of determination) measures: | the proportion of variance in Y explained by the model |
| Multicollinearity in regression refers to | High correlation among the predictor variables themselves |
| When applying a preprocessing pipeline to scoring (deployment) data, you should use | scaler.transform(X_score) to apply the scale learned from training data |
| The primary reason for splitting data into training and test sets is to | Evaluate how well the model generalizes to data it has not seen before |
| What does Lasso regularization do to coefficients of predictors that contribute little to the model | It shrinks them toward zero, potentially to exactly zero |
| In healthcare classification, which metric is most critical to minimize when missing a positive case (e.g., a patient a risk) carries severe consequences | False Negative Rate |
| In the lab's clustering dataset (Weight, Cholesterol, Gender), which feature dominated cluster assignments when the data was NOT standardized | Cholesterol, because it had the highest variance |
| The Elbow method for selecting k plots which quantity on the y-axis? | Within-Cluster Sum of Squares (WCSS / inertia) |
| In Lab 1 (Boston Housing), how many PCA components were needed to retain at least 90% of the variance in the 13-feature dataset? | 7 |
| The explained variance ratio of a principal component represents | the proportion of total variance in the original data captured by that component |
| In TF-IDF vectorization, each column in the resulting feature matrix represents | A unique word (term) from the vocabulary, weighted by frequency and rarity |
| Cosine similarity between two vectors measures | the angle between the two vectors, with 1.0 meaning identical direction |
| The all-MiniLM-L6-v2 embedding model used in Lab 3 produces vectors with how many dimensions | 384 |
| Why does a plain LLM fail when asked about a private company's Q3 revenue | The document is not in the model's training data, so the model either refuses or fabricates a confident-sounding answer |
| Place the six stages of a RAG pipeline in the correct roder | Load --> Chunk --> Embed --> Store --> Retrieve --> Generate |
| The embedding model (gemini-embedding-001) used in this lab returns | A 3,072-number vector that captures the meaning of the input text |
| Why does cosine similarity on embeddings reliably retrieve the right document even when the query phrase does not appear verbatim in the corpus | Because embeddings encode meaning, so synonyms and paraphrases land near each other in vector space |
| The two main types of supervised learning problems covered in Lab 1 are ____ (predicting a number) and _____ (predicting a category) | regression, classification |
| In k-Means clustering, the algorithm repeats two steps until convergence: assigning each point to the nearest ___, and then recalculating the ___ of each cluster | centroid, centroid |
| In the AI hierarchy, ____ AI extends generative AI by taking actions and autonomously completing multi-step goals | Agentic |
| Deep learning is preferred over traditional ML when working with ________ data such as images, audio, and raw text | unstructured |
| ColumnTransformer allows you to apply ______ to numeric features and ____ to categorical features within a single preprocessing step | transformers/scalers, encoders |
| A false negative in the heart attack classifier means a patient who ___ have a second heart attack was predicted ____ | did, healthy |
| Logistic Regression outputs a value between o and 1 that represents the _____ of the positive class, which can be thresholded at 0.5 to produce a class label | probability |
| The Euclidean distance from (5,5) to (0,0) as the square root of (5^2 + 5^2), equals approximately ___ | 7.07 |
| To profile the clusters and understand what each group represents, we calculate the ____ of each feature within each cluster label | mean/centroid |
| PCA components are ordered so that the first component explains the _____ amount of variance, the second explains the next most, and so on | largest |
| TF-IDF stands for Term Frequency - ___ Document Frequency | Inverse |
| Unlike TF-IDF, embedding vectors are ____ -- every dimension carries meaning and no entries are zero | dense |
| Cosine similarity measures the ____ between two vectors rather than their absolute distance, making it well-suited for comparing text embeddings of different lengths | angle |
| The six stages of every RAG pipeline are: Load, ____, Embed, Store, Retrieve, and Generate | Chunk |
| The embedding model gemini-embedding-001 produces vectors with _______ dimensions, so embedding 100 chunks yields a matrix of shape (100,____) | 3072, 3072 |
| ML model workflow | define x y variables, split validation, initialize, fit(), predict(), compare actual y with predicted y, deploy model |
| PCA is used for | numerical columns |
| Embedding is used for | text data |
| RAG stands for | Retrieval Augmented Generation |
| RAGs are used for | private data, uses embedding |
| LLM vs RAG | LLM trained on public internet data from memory, RAG uses provided knowledge base and is a smart search system |
| What are the 3 main types of ML problems? | Regression (numerical output), Classification (categorical output), Clustering (no labeled output). |
| What is regression? | A supervised learning method that predicts continuous numerical values. |
| What is classification? | A supervised learning method that predicts categories or labels. |
| What is clustering? | An unsupervised learning method that groups similar data points without labels. |
| What is the relationship between AI, ML, and DL? | AI ⟶ ML ⟶ DL (DL is a subset of ML, ML is a subset of AI). |
| What is ML best at? | Structured/tabular data. |
| What is Deep Learning best at? | Unstructured data like images, text, and video. |
| What is the order of AI evolution? | ML → DL → Generative AI → RAG → Agentic AI |
| What is the goal of regression? | Minimize sum of squared errors between actual and predicted values. |
| Why do we split validation data? | To test model performance on unseen data and prevent overfitting |
| Why do we standardize numerical columns? | To put all features on the same scale so no variable dominates. |
| What is multicollinearity? | When features are highly correlated with each other. |
| How do you handle multicollinearity? | Remove variables, use Lasso regression, or feature selection. |
| What is Lasso regression? | A regression method that penalizes large coefficients and performs feature selection |
| What is f_regression used for? | Feature selection by measuring relationship between features and target variable. |
| What is ColumnTransformer? | A tool to apply different preprocessing steps to different column types. |
| Why use ColumnTransformer? | To scale numerical data and encode categorical data correctly in one pipeline. |
| What is a confusion matrix? | A table showing TP, TN, FP, FN results of a classification model. |
| ML workflow steps? | 1) Initialize model → 2) Fit → 3) Predict |
| fit() vs transform()? | fit learns patterns; transform applies learned transformation. |
| What is fit_transform()? | Fits data and then immediately transforms it. |
| What is a pipeline in ML? | A sequence of preprocessing + modeling steps combined into one workflow. |
| Name common classification models. | Decision Tree, KNN, Logistic Regression, Random Forest. |
| What is the goal of clustering? | To group similar data points for profiling or pattern discovery. |
| What is Euclidean distance? | The straight-line distance between two points. |
| What is the elbow method? | A technique to choose the optimal number of clusters (K). |
| What is PCA (Principal Component Analysis)? | A method that reduces features while keeping most variance |
| Why standardize before PCA? | Because PCA is sensitive to scale. |
| Why is TF-IDF a sparse matrix? | Because most word positions are zero (most words don’t appear). |
| What is chunking in RAG? | Splitting documents into smaller sections (like paragraphs). |
| What does retrieval do in RAG? | Finds the most relevant chunks (top-k) based on similarity. |
| What does the LLM do in RAG? | Generates a final response using retrieved information. |
| **Phase 1** | Data Capture |
| **Phase 2** | Data Preparation & Transformation |
| **Phase 3** | Descriptive Analytics / Predictive Analytics |
| ML uses unstructured/structured data? | Structured |
| DL uses unstructures/structured data? | Unstructured |