click below
click below
Normal Size Small Size show me how
CSCI 343 Exam 2
| Term | Definition |
|---|---|
| Explain the primary difference between supervised and unsupervised learning, specifically regarding the data used | Supervised learning uses labelled data, while unsupervised learning uses unlabeled data |
| When would you choose to use a regression model over a classification model? | When your labels are numerical (usually continuous) values instead of categorical values |
| in a simple linear regression model ˆy = θ0 + θ1x what is the parameter of θ0 | θ0 is the y intercept |
| in a simple linear regression model ˆy = θ0 + θ1x what is the parameter of θ1x | θ1x is the slope (or weight) associated with x |
| What is the main advantage of using Root Mean Squared Error (RMSE) as a loss function compared to Mean Squared Error (MSE) | RMSE is more interpretable since it is in the same units as the y (label) data |
| Compare Mean Absolute Error (MAE) to Mean Squared Error (MSE). How does their sensitivity to large errors differ? | Since MSE squares the error, it is more sensitive to large errors than MAE |
| Why is Adjusted R^2 often preferred over R^2 when evaluating a multiple linear regression model? | Adjusted R^2 takes into account the number of features, and penalizes the addition of features that do not improve the model's performance more than random change |
| In Adjusted R^2, what does variable n represent? | n is the number of data points |
| In Adjusted R^2, what does variable p represent? | p is the number of features |
| Give an example of when it is a good idea to use a hexbin plot | When you have too many data points for the graph to be readable if you plotted them individually. |
| Give an example of when you may need to use log scale | When plotting large values of money (income/salary, property value, loan amount, etc.) |
| What is the key difference between a histogram and a bar plot? | Histograms are used for quantitative data while bar plots are used for categorical data |
| Why should stacked plots be avoided? | They are hard to read and understand |
| What is the purpose of exploring data? | To understand patterns, detect problems, and prepare data for analysis modeling |
| What are some common issues spotted using summary statistics? | Missing values, invalid values or outliers, data ranges that are too wide/narrow, and inconsistent units |
| What is the goal of data visualization? | To communicate data insights clearly and effectively with minimal cognitive strain for the viewer. |
| What do you need to keep in mind when creating data visualizations? | 1. as much information as possible 2. make data stand out and maintain clarity 3. avoid clutter & too many elements 4. Choose proper aspect ratios and scaling 5. Center data properly; avoid skewing 1 side |
| What are three iterative stages of data visualization? | 1. Graph the data 2. Learn 3. Regraph to answer new questions |
| What types of graphs can be used for single-variable distributions? | Pie charts, histograms, density plots, and bar charts |
| When should you use a pie chart? | For showing proportions that sum to 100%, with few categories (ideally < or equal to 5) |
| Why are bar charts often preferred to pie charts? | They are easier to compare visually, especially when there are more categories |
| What does a histogram show? | The frequency of data within fixed-width bins or intervals. |
| Example of histogram use | Grouping customers' gas bills into $10 intervals to count frequency. |
| What is a density plot? | A smoothed, continuous version of a histogram, where the area under the curve = 1. |
| What does a point on a density plot represent? | A fraction of data that takes on that value |
| What's most important in a density plot? | The shape of the curve, not exact y-values |
| When should you use a logarithmic scale? | When percent change or order of magnitude is more meaningful than absolute values, or when data is heavily skewed. |
| What is a bar chart used for? | displaying frequencies of categorical data or discrete variables |
| Can a bar chart start above zero? | Sometimes, but it depends on context and the story being told |
| What are 4 questions data visualizations can help answer? | 1. What is the peak value? 2. How many peaks? (uni/bi modal) 3. How much variation exists? 4. Is the data concentrated in certain intervals/categories? |
| Which plots work best for 2 continuous variables? | Line graphs, scatter plots, hexbin plots |
| Which plots are best for 2 discrete variables? | Stacked bar charts or side by side graphs |
| What makes line graphs effective? | Each x-value should correspond to a unique y-value; no clutter or excess lines (limit to 3-4) |
| What is an area chart? | A line graph with color shading below the line to show magnitude(mix between bar and line chart) |
| What is a hexbin plot? | A 2D histogram that shows data density using color or shading |
| Why might stacked bar charts be difficult to read? | Because comparing values across stacks is visually harder than comparing side-by-side bars |
| What's a good alternative to stacked bar charts? | Side-by-side charts or multiple small plots |
| What is a model? | An idealized representation of a system, such as weather forecast |
| What is ML used for? | Generating decision-making and predictive models |
| What are the main stages of a ML project? | 1. Problem Definition 2. Data Collection 3. Data Preparation & reprocessing 4. Model Building 5. Evaluation 6. Deployment |
| What are the 3 main types of ML algorithms? | Supervised, Unsupervised, and Reinforcement learning |
| What is supervised learning? | Learning from labeled data to predict outputs (classification or regression) |
| What is unsupervised learning? | Learning from unlabeled data to find hidden patterns or clusters |
| What is reinforcement learning? | Learning to make sequential decisions through trial and error in an environment |
| What is the goal of supervised learning? | To find relationships between inputs (features) and outputs (targets) |
| What does a regression model predict? | Continuous numerical outcomes |
| What does a classification model predict? | Categorical outcomes (high-risk/low-risk, tumor type, dog breed) |
| Predicting Rainfall- what model would be used? | Regression (continuous output) |
| Predicting Gender from an image - what model would be used? | Classification (categorical) |
| Predicting house prices- what model would be used? | Regression |
| What type of learning does linear regression use? | Supervised learning |
| What does linear regression do? | Estimates relationships between a dependent variable and one or more independent variables |
| What is the equation for simple linear regression? | 𝑦^=𝜃0+𝜃1𝑥y^=θ0 +θ1x or 𝑦=𝑚𝑥+𝑏y=mx+b |
| Why are estimates imperfect? | Because data has variability and noise - models simplify reality |
| How does covariance measure? | How two variables vary together |
| What does positive covariance indicate? | When one variable increases, the other tends to increase |
| What does negative covariance indicate? | When one variable increases, the other tends to decrease |
| What does zero covariance indicate? | No consistent relationship between the variables |
| Why is covariance hard to interpret | Its value depends on the units of the variables |
| What is correlation? | a standardized form of covariance that measures the strength and direction of a linear relationship. |
| What range do correlation values fall within? | Between -1 (perfect negative) and +1 (perfect positive) |
| Does correlation imply causation | No! Dur! |
| What is the adjusted R^2 value? | A version of R^2 adjusted for the number of features used; it only increases if a new feature improves the model by chance. |
| What is RMSE (Root Mean Square Error)? | A measure of model prediction error- the square root of the average squared differences between predicted and actual values |
| What does standard deviation represent? | The average amount by which values differ from the mean; a measure of spread |
| What type of task does logistic regression model do? | Classification -- predicting categorical outcomes |
| Give an example of a logistic regression use case | Diagnosing whether a tumor is benign or malignant, predicting COVID test results |
| What kind of outputs does logistic regression produce? | Probabilities that map to categorical classes |
| What is the purpose of data visualization in data analysis? | To explore, understand, and communicate insights from data clearly. |
| what is the main goal of linear regression? | To model and predict a continuous dependent variable based on one or more independent variables. |
| What is the goal of logistic regression? | To model classification tasks where the outcome is categorical, such as "yes/no" or "positive/negative" |
| How is logistic regression different from linear regression? | Linear regression predicts continuous outputs; logistic regression predicts categorical outcomes by estimating probabilities between 0 and 1 |
| What is the range of logistic regression output values? | Between 0 and 1 (probability values) |
| What function converts real-values outputs into probabilities? | the logistic (sigmoid) function |
| What is the formula for the sigmoid function? | σ(x)=1/(1+e^−x1) |
| What does the sigmoid function do in logistic regression? | Maps any real number to a value between 0 and 1, representing probability |
| What type of function is the sigmoid in machine learning? | an activation function |
| How can you classify data using a regression model? | Draw a cutoff (threshold). Values above it are classified as one class, values below it as another. |
| What is a classification threshold? | The probability value used to decide which class a data point belongs to (commonly .5) |
| What loss function is used in binary logistic regression? | Binary Cross Entropy (BCE) Loss |
| What is the goal of BCE Loss? | To measure how well predicted probabilities match actual binary outcomes. |
| What activation function is paired with BCE Loss? | The sigmoid function |
| What loss function is used in multinomial logistic regression? | Categorical Cross Entropy (CCE) Loss |
| What activation function is used for multiclass logistic regression? | The softmax function |
| What does the softmax function do? | converts a vector of raw scores into probabilities that sum to 1 across all classes |
| How is the predicted class label determined in multiclass classification? | By the class with the highest probability from the softmax output |
| What is a confusion matrix? | a table comparing predicted vs actual outcomes, showing counts of true/false positives and negatives |
| What are the four basic terms in a confusion matrix? | 1. TP (True Positive): Model correctly predicts positive class. 2. FP (False Positive): Model incorrectly predicts positive 3. TN (True Negative): Model correctly predicts negative class 4. FN (False Negative): Model incorrectly predicts negative |
| What does precision measure? | The proportion of correctly predicted positive samples out of all predicted positives |
| Precision formula | TP/ (TP + FP) |
| Why is high precision important? | When false positives are costly, like in medical diagnoses |
| What does Recall (sensitivity) measure? | The proportion of actual positives that were correctly identified |
| Recall formula | TP/(TP+FN) |
| Why is high recall important? | When missing a positive case is costly, such as fraud detection or disease screening |
| What is Specificity? | The proportion of actual negatives that were correctly identified. |
| Specificity Formula | TN/(TN+FP) |
| What is False Positive Rate? (FPR) | The proportion of negative samples incorrectly predicted as positive |
| False Positive Rate formula | FP/(FP + TN) |
| What is False Negative Rate? (FNR) | The proportion of positive samples incorrectly predicted as negative. |
| False Negative Rate formula | FN/(FN+TP) |
| What is Accuracy? | The overall percentage of correctly classified samples |
| Accuracy Formula | (TP+TN)/(TP+TN+FP+FN) |
| When is accuracy misleading? | When classes are imbalanced (99% negatives, 1% positives) |
| What is the F1 Score? | The harmonic mean of precision and recall |
| F1 Formula | 2 * (precision * recall)/(precision + recall) |
| Why use the F1 score? | It balances false positives and false negatives, especially useful with imbalanced datasets |
| What is a ROC curve? | A plot of the True Positive Rate (Recall) against the False Positive Rate at various thresholds |
| What does the Area Under the Curve (AUC) represent? | The model's ability to distinguish between classes |
| What does AUC =1 mean? | Perfect Classifier |
| What does AUC =.5 mean? | No discriminative ability (random guessing) |
| What does a higher AUC indicate? | Better classification performance. |
| Why should training and testing data be separate? | To prevent overfitting- the model must be evaluated on unseen data to assess generalizaiton |
| Why can summary metrics be misleading? | They may hide issues like poor data preprocessing or implementation errors |
| What is the best way to evaluate model performance? | Use unseen (test) data and multiple metrics, not just accuracy) |
| What is the difference between training, validation, and test sets? | Training data teaches the model; validation tunes it; test data evaluates final performance |
| What is multiple linear regression? | A model that uses two or more features to predict a target. |
| Multiple Linear Regression Formula | y^=θ0+θ1x1+θ2x2+...+θnxn |
| What does adding more features to a model do? | Can improve accuracy if features are useful, but risks overfitting |
| What is a loss function? | A formula that quantifies how far predictions are from actual values |
| List common loss functions for regression | MAE,MSE,RMSE |
| Define Mean Absolute Error (MAE) | Average of absolute differences |yi-^yi| (less sensitive to outliers) |
| Define Mean Squared Error (MSE) | Average of squared differences (yi-^yi)^2; penalizes large errors more |
| Define Root Mean Squared Error (RMSE) | Square root of MSE; interpretable in same units as target variable |
| What is the goal when fitting a model? | Minimize the average loss on training data |
| What is R^2 | The coefficient of determination; proportion of variance in y explained by x; best score = 1, can be negative if model is poor |
| How does logistic regression produce probabilities? | Applied sigmoid function to a linear equation to output values in [0,1] |