Busy. Please wait.

show password
Forgot Password?

Don't have an account?  Sign up 

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
We do not share your email address with others. It is only used to allow you to reset your password. For details read our Privacy Policy and Terms of Service.

Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
Didn't know it?
click below
Knew it?
click below
Don't know (0)
Remaining cards (0)
Know (0)
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

QM Exam 1

data warehouses vast digital repositories that record and store data electronically
Big Data describe data sets so large that traditional methods of storage and analysis are inadequate
transactional data data collected for recording the companies' transactions
data mining or predictive analytics process of using data, especially transactional data to make decisions and predictions
business analytics describes any use of data and statistical analysis to drive business decisions from data whether the purpose is predictive or simply descriptive
data numerical, alphabetic, or alphanumerical; useless unless we know what it represents
context answering the questions who, what when, why, where, and how can make data values meaningful
data table clearly shows who the data was about and what was measured
cases rows of a data table correspond to individual __________
variables some recorded characteristics
respondents individuals who answer a survey
subjects/participants people on whom we experiment
experimental units animals, plants, websites, and other inanimate subjects
records rows in a database
metadata typically contains information about how, when, and where (and possible why) the data were collected; who each case represents; and the definitions of all variables
spreadsheet a name that comes from bookkeeping ledgers of financial information
relational database two or more separate data tables are linked together so that information can be merged across them
categorical/qualitative variable when the values of a variable are simply the names of categories
quantitative variable when the values of a variable are measured numerical quantities
identifier variables categorical variables whose only purpose is to assign a unique identifier code to each individual in the data set
ordinal the variable is ______________ when the value of a categorical variable have an intrinsic order
nominal categorical variable with unordered categories
cross-sectional data several variables are measured at the same time point
frequency table records the counts for each of the categories of the variable
area principle says that the area occupied by a part of the graph should correspond to the magnitude of the value it represents
bar chart displays the distribution of a categorical variable, showing the counts for each category next to each other for easy comparison
relative frequency bar chart replace the counts with percentages in order to draw attention to the relative proportion of visits from each Source
pie chart shows how w whole group breaks into several categories
contingency tables they show how individuals are distributed along each variable depending on, or contingent on, the value of the other variable
marginal distribution when presented like this, at the margins of a contingency table, the frequency distribution of either one of the variables is called __________
cell any intersection of a row and column of the table; gives the count for a combination of values of the two variables
total percent, row percent, or column percent most statistics programs offer a choice for contingency tables
conditional distribution shows the distribution of one variable for just those cases that satisfy a condition on another
independent in a contingency table, when the distribution of one variable is the same for all categories of another variable, we say that the two variables are ________
segmented (or stacked) bar chart treats each bar as the "whole" and divides it proportionally into segments corresponding to the percentage in each group
mosaic plot looks like a segmented bar chart, but obeys the area principle better by making the bars proportional to the sizes of the groups
Simpson's Paradox only combine compatible measurements for comparable individuals
bins give the distribution of the quantitative variable and provide the building blocks for the display of the distribution called a histogram
histogram plots the bin counts as the heights of bars
gaps indicate a region where there are no values
relative frequency histogram alternative is to report the percentage of cases in each bin
stem-and-leaf displays like histograms, but they also show the individual values
quantitative data condition the data must be values of a quantitative variable whose units are known
shape, center, and spread when you describe a distribution, you should pay attention to these three things
shape we describe the shape of a distribution in terms of its modes, its symmetry, and whether it has any gaps or outlying values
modes humps of a histogram
unimodal a distribution whose histogram has one main hump
bimodal distributions whose histograms have two humps
multimodal histograms with three or more humps
uniform a distribution whose histogram doesn't appear to habe any mode and in which all the bars are approximately the same height
symmetric the halves of a distribution on either side of the center look, at least approximately, like mirror images
tails the (usually) thinner ends of a distribution
skewed if one tail stretches out farther than the other, the distribution is said to be ________ to the side of the longer tail
outliers any stragglers that stand off away from the body of the distribution
mean (average) add up all the values of the variable, x, and divide that sum by the number of data values
median the value that splits the histogram into two equal areas
range the difference between the extremes: max-min
lower quartile (Q1) value for which one quarter of the data lie below it
upper quartile (Q3) value for which one quarter of the data lie above it
interquartile rage (IQR) summarizes the spread by focusing on the middle half of the data; it's defined as the difference between the two quartiles: Q3-Q1
variance the average of the squared deviations
standard deviation we want measures of spread to have the same units as the data, so we usually take the square root of the variance, giving the __________
standardized value the resulting value of standard deviation
z-score tells us how many standard deviations a value is from its mean
five-number summary reports a distribution's median, quartiles, and extremes (max and min)
boxplot displays the information from a five-number summary
stationary when a time series has no strong trend or change in variability
time series plot a display of values against time
re-express/transform one way to make a skewed distribution more symmetric is to ___________ the data by applying a simple function to all the data values
scatterplot plots one quantitative variable against another
direction pattern that can either be negative, positive, or neither
form straight, curved, exotic, no pattern?
straight line relationship/linear form will appear as a cloud or swarm of points stretched out in a generally consistent, straight form
strength tightly clustered in a single stream or so variable and spread out that we can barely discern a trend or pattern?
explanatory or predicator variable variable on the x-axis
response variable variable on the y-axis
independent and dependent variables the idea is that the y-variable depends on the x-variable and the x-variable act independently to make y respond
correlation coefficient a numerical measure of the direction and strength of a linear association
correlation measures the strength of the linear association between two quantitative variables
quantitative variables condition correlation applies only to quantitative variables
linearity condition correlation measures the strength only of the linear association and will be misleading if the relationship is not straight enough
outlier condition unusual observations can distort the correlation and can make an otherwise small correlation look big or, on the other hand, hide a large correlation
lurking variable some third variable that affects both of the variables you have observed
linear model just an equation of a straight line through the data
predicted value the prediction for y found for each x-value in the data; found by substituting the x-value in the regression equation; values on the fitted line
residual the difference between the predicted value and the observed value
line of best fit/least squares line the line for which the sum of the squared residuals is smallest
slope b1 is given in y-units per x-unit. differences of one unit in x are associated with differences of b1 units in predicted values of y
intercept the value of the line when the x-variable is zero
regression lines common name for least squares lines
regression to the mean because the correlation is always less than 1.0 in magnitude, each predicted y tends to be fewer standard deviations from its mean than its corresponding x is from its mean
quantitative data condition pretty easy to check, but don't be fooled by categorical data recorded as numbers
linearity assumption the regression model assumes that the relationship between the variables is, in fact, linear
linearity condition the two variables must have a linear association, or the model won't mean a thing and decisions you base on the model may be wrong
outlier condition make sure that no points need special attention
independence assumption assumption that the residuals are independent of each other
equal spread condition new assumption about the standard deviation around the line gives us this new condition
R-squared all regression analyses include this statistic, although by tradition, it is written with a capital letter; a fraction of a whole, it is often given a percentage
Spearman rank correlation works with the ranks of the data rather than their values
random phenomena we can't predict the individual outcomes, but we can hope to understand characteristics of their long-run behavior
trial each attempt of a random phenomena
outcome generated be each trial of a random phenomena
event more general term to refer to outcomes or combinations of outcomes
sample space a special event; the collection of all possible outcomes
probability the percentage of the callers who qualify
independence the outcome of one trial doesn't influence or change the outcome of another
Law of Large Numbers (LLN) states that if the events are independent, then as the number of trials increases, the long-run relative frequency of any outcome gets closer and closer to a single value
empirical probability because it is based on repeatedly observing the event's outcome, this definition of probability is often called ____________
theoretical probability when we have equally likely outcomes
personal probability we call this kind of probability subjective
probability a number between 0 and 1
probability assignment rule the probability of the set of all possible outcomes must be 1. P(S) =1
complement rule the probability of an event occurring is 1 minus the probability that doesn't occur. P(A)=1-P(A^c)
multiplication rule to find the probability that two independent events occur, we multiply the probabilities; P(A and B)=P(A) x P(B), provided that A and B are independent
disjoint or mutually exclusive two events are _________ if they have no outcome in common
addition rule allows us tot add the probabilities of disjoint events to get the probability that either event occurs: P(A or B)=P(A) + P(B), provided that A and B are disjoint
general addition rule does not require disjoint events: P(A or B)=P(A) + P(B) - B(A and B) for any two events A and B
marginal probability uses a marginal frequency (from either the total row or total column) to compute the probability
joint probabilities probability that two events occur together
conditional probability a probability that takes into account a given condition
general multiplication rule for compound events that does not require the events to be independent: P(A and B)=P(A) x P(B|A) for any two events A and B
independent events A and B are __________ whenever P(B|A)=P(B)
tree diagram probability tree used to help think through the decision-making process
Created by: pace_sauce



Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!

"Know" box contains:
Time elapsed:
restart all cards