click below
click below
Normal Size Small Size show me how
Exam 1 Part 2
Data Science Overview
Question | Answer |
---|---|
What is supervised learning? | Predictive learning - science of using data to predict an outcome (clicking, subscription, cancerous cells, price of a stock). Learns from data which has ‘correct’ answers given data input. E.g. regression/random forest/decision trees |
What is unsupervised learning? | Descriptive learning - using data to group items/users into categories (ie. extract topics/categories from articles). Try to infer a hidden structure in the data without proper training examples (no teacher, hence ‘unsupervised). E.g. K-means clustering, Decision tree clustering |
What is reinforcement learning? | Attempts to find an optimal action to maximize the expected reward/outcome (ie. who should we show this kind of ad to, or who should we send this marketing email to?). |
What are the main differences between supervised, unsupervised, and reinforcement learning? | Supervised learning requires a labeled dataset for training. Unsupervised learning identifies hidden data patterns from an unlabeled dataset, while Reinforcement learning does not require data as it learns by interacting with the environment. |
What type of learning might be used for ChatGPT? | ChatGPT is built on language models developed using supervised and reinforcement learning |
What is A/B testing? | Testing on split data to determine which training/model performs better. Its a form a statistical two-sample hypothesis testing to see if there is a difference between samples/treatments. |
What are some examples of how A/B testing can be used? | |
What is linear regression? | Predictive analysis of a continuous response variable based on a set of explanatory variables (minimizing the sum of squares of the residuals, assumes linearity) |
What are decision trees? | A decision tree is another way of finding a “rule” or "model" which assigns user attributes to an outcome. It is a type of flowchart that is used to model decisions based on a series of if-then statements. Can be used for both classification (categorical outcome) and regression (continuous) problems |
What are the differences between linear regression and decision trees? | Linear regression assumes a linear relationship between the dependent and independent variables and is used for continuous data, while decision trees can handle both continuous and categorical data and do not assume a specific relationship . Decision trees are often used when the data is non-linear or when there are complex interactions between the predictors |
How to check for quality issues in data science? | Completeness - missing values, nulls (can impute or remove) Fidelity - incorrect data, erroneous data, inconsistent data types, data true to reality? (clean, validate, and audit data) Consistency - check data anomalies, outliers, unusual patterns (e.g. duplicates). Use statistical measures to detect outliers/anomalies. |