click below
click below
Normal Size Small Size show me how
BA1
| Term | Definition |
|---|---|
| Analytics | the scientific process of transforming data into insight for making better decisions. (INFORMS) |
| Descriptive Analytics | classify or categorize data |
| Diagnostic Analytics | diagnose the problem |
| Predictive Analytics | predict future outcomes or behaviors |
| Prescriptive Analytics | make recommendations (or decisions) |
| Rows | describe a collection of things (items or people) and are called observations or cases |
| unit of observation | tells us what each row of the dataset |
| Columns | describe common attributes shared by the items (or people) and are called variables or feature |
| number of rows (n) | sample size |
| Cross-Sectional Data | Data collected for many subjects at the same point in time (or without regard to the differences in time) |
| Time Series Data | Data collected over several time periods for specific groups of people, objects, of events |
| Structured Data | Follow a pre-defined row and column format |
| Unstructured Data | Do not conform to a pre-defined row and column format (e.g., textual data or multimedia content) |
| Nominal Scale | name categories without implying order (the assignment of numbers is arbitrary) |
| Ordinal Scale | name categories that can be ordered or ranked Usually expressed in words but coded into numbers for data processing (ex 5 star rating scale) |
| Interval scale | numerical values that can be added or subtracted |
| Ratio scale | numerical values that can be added, subtracted, multiplied or divided (makes ratio comparisons possible) |
| A frequency table | summarizes the distribution of a categorical variable by listing each group or category along with its; shows proportion or percentage in each category |
| five number summary | Box plot info (min, 1st quartile, median, 3rd quartile, max) |
| Median | Value in the middle of a sorted list of numerical values (a typical value); 50th percentile |
| Interquartile Range (IQR) | 3rd quartile minus 1st quartile (size of box) |
| Range | Max - Min |
| histograms | may look at symmetry and skew of data |
| Visualize Categorical Data | Bar charts, pie charts |
| Visualize Quantitative data | Box and whisker chart, Histogram |
| mean | Arithmetic average; divide the sum of the values by the number of values ; |
| Variance | How far a value is from the mean, 𝑠^2=((𝑦_1−𝑦 ̄ )^2+(𝑦_2−𝑦 ̄ )^2+⋯+(𝑦_𝑛−𝑦 ̄ )^2)/(𝑛−1) |
| standard deviation | a measure of variability in the original units of the data (the variance results in squared units); square root of the variance |
| symmetrical histogram | A distribution is symmetric if the two sides of its histogram are mirror images (bellshaped) |
| Skewed histogram | if one tail of the histogram stretches out farther than the other. Skewed left if looks like right root. Skewed right if looks like left foot |
| empirical rule | uses the standard deviation to describe how data with a bell-shaped distribution cluster around the mean |
| Marginal Probabilities | the probability of observing an outcome with a single attribute, regardless of its other attributes |
| independent events | Two events are independent if the occurrence of one does not affect the chances for the occurrence of the other |
| Multiplication Rule | Two events A and B are independent if the probability that both A and B occur is the product of the probabilities of the two events. |
| random variable | a function that assigns a numerical value to each point in the sample space. |
| Discrete | A random variable that takes on one of a list of possible values (counts) (1234) |
| Continuous | A random variable that takes on any value in an interval(1.5,2.75,3.88) |
| probability mass functions | generate the probabilities for the possible outcomes of a random variable |