click below

click below

Normal Size Small Size show me how

# QM Exam 1

Term | Definition |
---|---|

data warehouses | vast digital repositories that record and store data electronically |

Big Data | describe data sets so large that traditional methods of storage and analysis are inadequate |

transactional data | data collected for recording the companies' transactions |

data mining or predictive analytics | process of using data, especially transactional data to make decisions and predictions |

business analytics | describes any use of data and statistical analysis to drive business decisions from data whether the purpose is predictive or simply descriptive |

data | numerical, alphabetic, or alphanumerical; useless unless we know what it represents |

context | answering the questions who, what when, why, where, and how can make data values meaningful |

data table | clearly shows who the data was about and what was measured |

cases | rows of a data table correspond to individual __________ |

variables | some recorded characteristics |

respondents | individuals who answer a survey |

subjects/participants | people on whom we experiment |

experimental units | animals, plants, websites, and other inanimate subjects |

records | rows in a database |

metadata | typically contains information about how, when, and where (and possible why) the data were collected; who each case represents; and the definitions of all variables |

spreadsheet | a name that comes from bookkeeping ledgers of financial information |

relational database | two or more separate data tables are linked together so that information can be merged across them |

categorical/qualitative variable | when the values of a variable are simply the names of categories |

quantitative variable | when the values of a variable are measured numerical quantities |

identifier variables | categorical variables whose only purpose is to assign a unique identifier code to each individual in the data set |

ordinal | the variable is ______________ when the value of a categorical variable have an intrinsic order |

nominal | categorical variable with unordered categories |

cross-sectional data | several variables are measured at the same time point |

frequency table | records the counts for each of the categories of the variable |

area principle | says that the area occupied by a part of the graph should correspond to the magnitude of the value it represents |

bar chart | displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison |

relative frequency bar chart | replace the counts with percentages in order to draw attention to the relative proportion of visits from each Source |

pie chart | shows how w whole group breaks into several categories |

contingency tables | they show how individuals are distributed along each variable depending on, or contingent on, the value of the other variable |

marginal distribution | when presented like this, at the margins of a contingency table, the frequency distribution of either one of the variables is called __________ |

cell | any intersection of a row and column of the table; gives the count for a combination of values of the two variables |

total percent, row percent, or column percent | most statistics programs offer a choice for contingency tables |

conditional distribution | shows the distribution of one variable for just those cases that satisfy a condition on another |

independent | in a contingency table, when the distribution of one variable is the same for all categories of another variable, we say that the two variables are ________ |

segmented (or stacked) bar chart | treats each bar as the "whole" and divides it proportionally into segments corresponding to the percentage in each group |

mosaic plot | looks like a segmented bar chart, but obeys the area principle better by making the bars proportional to the sizes of the groups |

Simpson's Paradox | only combine compatible measurements for comparable individuals |

bins | give the distribution of the quantitative variable and provide the building blocks for the display of the distribution called a histogram |

histogram | plots the bin counts as the heights of bars |

gaps | indicate a region where there are no values |

relative frequency histogram | alternative is to report the percentage of cases in each bin |

stem-and-leaf displays | like histograms, but they also show the individual values |

quantitative data condition | the data must be values of a quantitative variable whose units are known |

shape, center, and spread | when you describe a distribution, you should pay attention to these three things |

shape | we describe the shape of a distribution in terms of its modes, its symmetry, and whether it has any gaps or outlying values |

modes | humps of a histogram |

unimodal | a distribution whose histogram has one main hump |

bimodal | distributions whose histograms have two humps |

multimodal | histograms with three or more humps |

uniform | a distribution whose histogram doesn't appear to habe any mode and in which all the bars are approximately the same height |

symmetric | the halves of a distribution on either side of the center look, at least approximately, like mirror images |

tails | the (usually) thinner ends of a distribution |

skewed | if one tail stretches out farther than the other, the distribution is said to be ________ to the side of the longer tail |

outliers | any stragglers that stand off away from the body of the distribution |

mean (average) | add up all the values of the variable, x, and divide that sum by the number of data values |

median | the value that splits the histogram into two equal areas |

range | the difference between the extremes: max-min |

lower quartile (Q1) | value for which one quarter of the data lie below it |

upper quartile (Q3) | value for which one quarter of the data lie above it |

interquartile rage (IQR) | summarizes the spread by focusing on the middle half of the data; it's defined as the difference between the two quartiles: Q3-Q1 |

variance | the average of the squared deviations |

standard deviation | we want measures of spread to have the same units as the data, so we usually take the square root of the variance, giving the __________ |

standardized value | the resulting value of standard deviation |

z-score | tells us how many standard deviations a value is from its mean |

five-number summary | reports a distribution's median, quartiles, and extremes (max and min) |

boxplot | displays the information from a five-number summary |

stationary | when a time series has no strong trend or change in variability |

time series plot | a display of values against time |

re-express/transform | one way to make a skewed distribution more symmetric is to ___________ the data by applying a simple function to all the data values |

scatterplot | plots one quantitative variable against another |

direction | pattern that can either be negative, positive, or neither |

form | straight, curved, exotic, no pattern? |

straight line relationship/linear form | will appear as a cloud or swarm of points stretched out in a generally consistent, straight form |

strength | tightly clustered in a single stream or so variable and spread out that we can barely discern a trend or pattern? |

explanatory or predicator variable | variable on the x-axis |

response variable | variable on the y-axis |

independent and dependent variables | the idea is that the y-variable depends on the x-variable and the x-variable act independently to make y respond |

correlation coefficient | a numerical measure of the direction and strength of a linear association |

correlation | measures the strength of the linear association between two quantitative variables |

quantitative variables condition | correlation applies only to quantitative variables |

linearity condition | correlation measures the strength only of the linear association and will be misleading if the relationship is not straight enough |

outlier condition | unusual observations can distort the correlation and can make an otherwise small correlation look big or, on the other hand, hide a large correlation |

lurking variable | some third variable that affects both of the variables you have observed |

linear model | just an equation of a straight line through the data |

predicted value | the prediction for y found for each x-value in the data; found by substituting the x-value in the regression equation; values on the fitted line |

residual | the difference between the predicted value and the observed value |

line of best fit/least squares line | the line for which the sum of the squared residuals is smallest |

slope | b1 is given in y-units per x-unit. differences of one unit in x are associated with differences of b1 units in predicted values of y |

intercept | the value of the line when the x-variable is zero |

regression lines | common name for least squares lines |

regression to the mean | because the correlation is always less than 1.0 in magnitude, each predicted y tends to be fewer standard deviations from its mean than its corresponding x is from its mean |

quantitative data condition | pretty easy to check, but don't be fooled by categorical data recorded as numbers |

linearity assumption | the regression model assumes that the relationship between the variables is, in fact, linear |

linearity condition | the two variables must have a linear association, or the model won't mean a thing and decisions you base on the model may be wrong |

outlier condition | make sure that no points need special attention |

independence assumption | assumption that the residuals are independent of each other |

equal spread condition | new assumption about the standard deviation around the line gives us this new condition |

R-squared | all regression analyses include this statistic, although by tradition, it is written with a capital letter; a fraction of a whole, it is often given a percentage |

Spearman rank correlation | works with the ranks of the data rather than their values |

random phenomena | we can't predict the individual outcomes, but we can hope to understand characteristics of their long-run behavior |

trial | each attempt of a random phenomena |

outcome | generated be each trial of a random phenomena |

event | more general term to refer to outcomes or combinations of outcomes |

sample space | a special event; the collection of all possible outcomes |

probability | the percentage of the callers who qualify |

independence | the outcome of one trial doesn't influence or change the outcome of another |

Law of Large Numbers (LLN) | states that if the events are independent, then as the number of trials increases, the long-run relative frequency of any outcome gets closer and closer to a single value |

empirical probability | because it is based on repeatedly observing the event's outcome, this definition of probability is often called ____________ |

theoretical probability | when we have equally likely outcomes |

personal probability | we call this kind of probability subjective |

probability | a number between 0 and 1 |

probability assignment rule | the probability of the set of all possible outcomes must be 1. P(S) =1 |

complement rule | the probability of an event occurring is 1 minus the probability that doesn't occur. P(A)=1-P(A^c) |

multiplication rule | to find the probability that two independent events occur, we multiply the probabilities; P(A and B)=P(A) x P(B), provided that A and B are independent |

disjoint or mutually exclusive | two events are _________ if they have no outcome in common |

addition rule | allows us tot add the probabilities of disjoint events to get the probability that either event occurs: P(A or B)=P(A) + P(B), provided that A and B are disjoint |

general addition rule | does not require disjoint events: P(A or B)=P(A) + P(B) - B(A and B) for any two events A and B |

marginal probability | uses a marginal frequency (from either the total row or total column) to compute the probability |

joint probabilities | probability that two events occur together |

conditional probability | a probability that takes into account a given condition |

general multiplication rule | for compound events that does not require the events to be independent: P(A and B)=P(A) x P(B|A) for any two events A and B |

independent | events A and B are __________ whenever P(B|A)=P(B) |

tree diagram | probability tree used to help think through the decision-making process |