Question 1

Explain the primary difference between supervised and unsupervised learning, specifically regarding the data used

Accepted Answer

Supervised learning uses labelled data, while unsupervised learning uses unlabeled data

Question 2

When would you choose to use a regression model over a classification model?

Accepted Answer

When your labels are numerical (usually continuous) values instead of categorical values

Question 3

in a simple linear regression model   ˆy = θ0 + θ1x what is the parameter of θ0

Accepted Answer

θ0 is the y intercept

Question 4

in a simple linear regression model   ˆy = θ0 + θ1x what is the parameter of  θ1x

Accepted Answer

θ1x is the slope (or weight) associated with x

Question 5

What is the main advantage of using Root Mean Squared Error (RMSE) as a loss function compared to Mean Squared Error (MSE)

Accepted Answer

RMSE is more interpretable since it is in the same units as the y (label) data

Question 6

Compare Mean Absolute Error (MAE) to Mean Squared Error (MSE). How does their sensitivity to large errors differ?

Accepted Answer

Since MSE squares the error, it is more sensitive to large errors than MAE

Question 7

Why is Adjusted R^2 often preferred over R^2 when evaluating a multiple linear regression model?

Accepted Answer

Adjusted R^2 takes into account the number of features, and penalizes the addition of features that do not improve the model's performance more than random change

Question 8

In Adjusted R^2, what does variable n represent?

Accepted Answer

n is the number of data points

Question 9

In Adjusted R^2, what does variable p represent?

Accepted Answer

p is the number of features

Question 10

Give an example of when it is a good idea to use a hexbin plot

Accepted Answer

When you have too many data points for the graph to be readable if you plotted them individually.

Question 11

Give an example of when you may need to use log scale

Accepted Answer

When plotting large values of money (income/salary, property value, loan amount, etc.)

Question 12

What is the key difference between a histogram and a bar plot?

Accepted Answer

Histograms are used for quantitative data while bar plots are used for categorical data

Question 13

Why should stacked plots be avoided?

Accepted Answer

They are hard to read and understand

Question 14

What is the purpose of exploring data?

Accepted Answer

To understand patterns, detect problems, and prepare data for analysis modeling

Question 15

What are some common issues spotted using summary statistics?

Accepted Answer

Missing values, invalid values or outliers, data ranges that are too wide/narrow, and inconsistent units

Question 16

What is the goal of data visualization?

Accepted Answer

To communicate data insights clearly and effectively with minimal cognitive strain for the viewer.

Question 17

What do you need to keep in mind when creating data visualizations?

Accepted Answer

1. as much information as possible
2. make data stand out and maintain clarity
3. avoid clutter & too many elements
4. Choose proper aspect ratios and scaling
5. Center data properly; avoid skewing 1 side

Question 18

What are three iterative stages of data visualization?

Accepted Answer

1. Graph the data
2. Learn
3. Regraph to answer new questions

Question 19

What types of graphs can be used for single-variable distributions?

Accepted Answer

Pie charts, histograms, density plots, and bar charts

Question 20

When should you use a pie chart?

Accepted Answer

For showing proportions that sum to 100%, with few categories (ideally < or equal to 5)

Question 21

Why are bar charts often preferred to pie charts?

Accepted Answer

They are easier to compare visually, especially when there are more categories

Question 22

What does a histogram show?

Accepted Answer

The frequency of data within fixed-width bins or intervals.

Question 23

Example of histogram use

Accepted Answer

Grouping customers' gas bills into $10 intervals to count frequency.

Question 24

What is a density plot?

Accepted Answer

A smoothed, continuous version of a histogram, where the area under the curve = 1.

Question 25

What does a point on a density plot represent?

Accepted Answer

A fraction of data that takes on that value

Question 26

What's most important in a density plot?

Accepted Answer

The shape of the curve, not exact y-values

Question 27

When should you use a logarithmic scale?

Accepted Answer

When percent change or order of magnitude is more meaningful than absolute values, or when data is heavily skewed.

Question 28

What is a bar chart used for?

Accepted Answer

displaying frequencies of categorical data or discrete variables

Question 29

Can a bar chart start above zero?

Accepted Answer

Sometimes, but it depends on context and the story being told

Question 30

What are 4 questions data visualizations can help answer?

Accepted Answer

1. What is the peak value?
2. How many peaks? (uni/bi modal)
3.  How much variation exists?
4. Is the data concentrated in certain intervals/categories?

Question 31

Which plots work best for 2 continuous variables?

Accepted Answer

Line graphs, scatter plots, hexbin plots

Question 32

Which plots are best for 2 discrete variables?

Accepted Answer

Stacked bar charts or side by side graphs

Question 33

What makes line graphs effective?

Accepted Answer

Each x-value should correspond to a unique y-value; no clutter or excess lines (limit to 3-4)

Question 34

What is an area chart?

Accepted Answer

A line graph with color shading below the line to show magnitude(mix between bar and line chart)

Question 35

What is a hexbin plot?

Accepted Answer

A 2D histogram that shows data density using color or shading

Question 36

Why might stacked bar charts be difficult to read?

Accepted Answer

Because comparing values across stacks is visually harder than comparing side-by-side bars

Question 37

What's a good alternative to stacked bar charts?

Accepted Answer

Side-by-side charts or multiple small plots

Question 38

What is a model?

Accepted Answer

An idealized representation of a system, such as weather forecast

Question 39

What is ML used for?

Accepted Answer

Generating decision-making and predictive models

Question 40

What are the main stages of a ML project?

Accepted Answer

1. Problem Definition
2. Data Collection
3. Data Preparation & reprocessing
4. Model Building
5. Evaluation
6. Deployment

Question 41

What are the 3 main types of ML algorithms?

Accepted Answer

Supervised, Unsupervised, and Reinforcement learning

Question 42

What is supervised learning?

Accepted Answer

Learning from labeled data to predict outputs (classification or regression)

Question 43

What is unsupervised learning?

Accepted Answer

Learning from unlabeled data to find hidden patterns or clusters

Question 44

What is reinforcement learning?

Accepted Answer

Learning to make sequential decisions through trial and error in an environment

Question 45

What is the goal of supervised learning?

Accepted Answer

To find relationships between inputs (features) and outputs (targets)

Question 46

What does a regression model predict?

Accepted Answer

Continuous numerical outcomes

Question 47

What does a classification model predict?

Accepted Answer

Categorical outcomes (high-risk/low-risk, tumor type, dog breed)

Question 48

Predicting Rainfall- what model would be used?

Accepted Answer

Regression (continuous output)

Question 49

Predicting Gender from an image - what model would be used?

Accepted Answer

Classification (categorical)

Question 50

Predicting house prices- what model would be used?

Accepted Answer

Regression

Question 51

What type of learning does linear regression use?

Accepted Answer

Supervised learning

Question 52

What does linear regression do?

Accepted Answer

Estimates relationships between a dependent variable and one or more independent variables

Question 53

What is the equation for simple linear regression?

Accepted Answer

𝑦^=𝜃0+𝜃1𝑥y^=θ0 +θ1x or 𝑦=𝑚𝑥+𝑏y=mx+b

Question 54

Why are estimates imperfect?

Accepted Answer

Because data has variability and noise - models simplify reality

Question 55

How does covariance measure?

Accepted Answer

How two variables vary together

Question 56

What does positive covariance indicate?

Accepted Answer

When one variable increases, the other tends to increase

Question 57

What does negative covariance indicate?

Accepted Answer

When one variable increases, the other tends to decrease

Question 58

What does zero covariance indicate?

Accepted Answer

No consistent relationship between the variables

Question 59

Why is covariance hard to interpret

Accepted Answer

Its value depends on the units of the variables

Question 60

What is correlation?

Accepted Answer

a standardized form of covariance that measures the strength and direction of a linear relationship.

Question 61

What range do correlation values fall within?

Accepted Answer

Between -1 (perfect negative) and +1 (perfect positive)

Question 62

Does correlation imply causation

Accepted Answer

No! Dur!

Question 63

What is the adjusted R^2 value?

Accepted Answer

A version of R^2 adjusted for the number of features used; it only increases if a new feature improves the model by chance.

Question 64

What is RMSE (Root Mean Square Error)?

Accepted Answer

A measure of model prediction error- the square root of the average squared differences between predicted and actual values

"Know" box contains:
Time elapsed:
Retries:

CSCI 343 Exam 2