Question 1

What is Data Science

Accepted Answer

cross-disciplinary practice that draws on methods from data engineering, descriptive statistics, data mining, machine learning, predictive analysis

Question 2

What are the stages of a data science project?

Accepted Answer

Get, Explore, Clean, Visualize, Train & Test

Question 3

What the approach when data is expensive

Accepted Answer

Define goal/problem first, then explore data

Question 4

What the approach when data is cheap?

Accepted Answer

Explore data first, then see how it can be used

Question 5

What is a Primary Data source?

Accepted Answer

Data collected directly by a resarcher

Question 6

Example of a primary data source

Accepted Answer

HMDA, Census

Question 7

What is a Secondary Data source?

Accepted Answer

Data collected by someone else

Question 8

Secondary Data source example

Accepted Answer

Kaggle, UCI ML Repository, Data.gov

Question 9

Problems to look for when exploring data

Accepted Answer

Missing/unusual values, anomalies/outliers, lack of diversity/representation, bias

Question 10

How can missing values be handled?

Accepted Answer

imputing with measures of central tendency (mean,median,mode)

Question 11

Why are unbalanced distributions problematic?

Accepted Answer

bias models, especially classification models, making them less effective

Question 12

Difference between correlation and causation

Accepted Answer

correlation is a relationship/pattern between two variables and causation is the principle where one event causes another

Question 13

What i a spurious correlation

Accepted Answer

Appears to be correlated but has no causation

Question 14

Three main types of machine learning

Accepted Answer

Supervised, unsupervised, reifnorcement

Question 15

What is supervised learning

Accepted Answer

learning from labeled data

Question 16

What is unsupervised learning

Accepted Answer

learning from finding patterns in unlabeled data

Question 17

What is reinforcement learning

Accepted Answer

learns decisions via environment interaction

Question 18

What is Qualitative Data

Accepted Answer

Data that cannot be measured numerically; sorted by categories.

Question 19

Examples of Qualitative Data

Accepted Answer

Audio, Images, text, race

Question 20

What is Nominal Data

Accepted Answer

Categorical data with no order or inherent ranking

Question 21

Nominal Data example

Accepted Answer

sex, race, hair color

Question 22

What is ordinal data

Accepted Answer

Categorical data with ordering, but differences between ranks may not be equal

Question 23

Ordinal Data example

Accepted Answer

rankings, letter grades, satisfaction levels

Question 24

What is Quantitative Data

Accepted Answer

Numerical data that can be measured or counted

Question 25

Quantitative data example

Accepted Answer

heigh, weight, income

Question 26

What is Continuous Data?

Accepted Answer

Quantitative data that can take any value (measured)

Question 27

Continuous data example

Accepted Answer

temperature, GPA

Question 28

What is discrete data?

Accepted Answer

Quantitative data that takes specific countable values

Question 29

Discrete data example

Accepted Answer

number of dogs, birth year

Question 30

What is structured data?

Accepted Answer

Data prganized into rows / columns in relational databases; about 20% of enterprise data; easier to manage

Question 31

Structured Data example

Accepted Answer

numbers, dates, strings

Question 32

What is Unstructured Data?

Accepted Answer

Data not stored in relational databases, lack schema; 80% of enterprise data

Question 33

Unstructured Data example

Accepted Answer

NoSQL databses (MongoDB,Couchbase,Cassandra)

Question 34

Define Mean

Accepted Answer

Average value of a dataset

Question 35

Arithmetic mean formula

Accepted Answer

sum of values / number of values

Question 36

Weighted mean

Accepted Answer

Accounts for different weights of values: sum(val x weight)/sum(weight)

Question 37

median

Accepted Answer

middle value of dataset when ordered

Question 38

mode

Accepted Answer

values that occur most frequently

Question 39

Unimodal

Accepted Answer

1 mode

Question 40

Bimodal

Accepted Answer

2 modes

Question 41

Multimodal

Accepted Answer

2+ modes

Question 42

Range

Accepted Answer

max-min

Question 43

Percentile

Accepted Answer

value below which a percentage of data falls

Question 44

Percentile example

Accepted Answer

95th percentile= better than 95% of test takers

Question 45

IQR

Accepted Answer

Interquartile Range; 75th percentile-25th percentile

Question 46

Variance

Accepted Answer

Average squared deviation from mean; measures spread

Question 47

Difference between population vs sample variance

Accepted Answer

population uses n in denominator; sample uses n-1

Question 48

Why is standard deviation easier to interpret than variance?

Accepted Answer

It is in the same units as the data (sqrt of variance)

Question 49

Experiment

Accepted Answer

Repeatable procedure th defined otucomes

Question 50

Experiment example

Accepted Answer

Tossing a coin

Question 51

Sample space

Accepted Answer

set of all possible outcomes

Question 52

Event

Accepted Answer

Subset of sample space

Question 53

Simple event

Accepted Answer

one outcome

Question 54

Compount event

Accepted Answer

multiple outcomes

Question 55

mutually exclusive events

Accepted Answer

events that cannot occur together (disjoint)

Question 56

3 axioms of probability

Accepted Answer

Non-negativity, Unity, Additivity

Question 57

Non-Negativity

Accepted Answer

P(A) ≥ 0

Question 58

Unity

Accepted Answer

P(S) = 1

Question 59

Additivity

Accepted Answer

If A and B are disjoint, P(A ∪ B) = P(A) + P(B)

Question 60

State the complement rule

Accepted Answer

P(not A) = 1 – P(A).

Question 61

State the general addition rule.

Accepted Answer

P(A ∪ B) = P(A) + P(B) – P(A ∩ B).

Question 62

What is conditional probability

Accepted Answer

P(A|B) = probability of A given B.

Question 63

Define dependent vs independent events.

Accepted Answer

Dependent: probability of one affects the other. Independent: no effect.

Question 64

What does exhaustive mean in probability?

Accepted Answer

Events that cover the entire sample space (one must happen).

"Know" box contains:
Time elapsed:
Retries:

CSI 343 Exam 1