Busy. Please wait.
or

show password
Forgot Password?

Don't have an account?  Sign up 
or

Username is available taken
show password

why


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
We do not share your email address with others. It is only used to allow you to reset your password. For details read our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
Don't know
Know
remaining cards
Save
0:01
To flip the current card, click it or press the Spacebar key.  To move the current card to one of the three colored boxes, click on the box.  You may also press the UP ARROW key to move the card to the "Know" box, the DOWN ARROW key to move the card to the "Don't know" box, or the RIGHT ARROW key to move the card to the Remaining box.  You may also click on the card displayed in any of the three boxes to bring that card back to the center.

Pass complete!

"Know" box contains:
Time elapsed:
Retries:
restart all cards
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

BSTAT 5301

TermDefinition
Data Facts and figures from which conclusions can be drawn.
What is Statistics? Statistics is a way to get information from data.
Data set The data that are collected for a particular study
Elements Data set consists of Elements. Ex: stocks, students, homes for sale, or other entries
Variable Any characteristic of an element. Ex: price of a stock, height of a student
Measurement A way to assign a value of a variable.
Quantitative The possible measurements are numbers that represent quantities.
Qualitative The possible measurements are descriptive and not numbers.
Cross-sectional data Data collected at the same or approximately the same point in time
Time series data Data collected over different time periods
Population A set of all elements about which we wish to draw conclusions
Census An examination all elements of a population
Sample A subset of the elements of a population
Descriptive Statistics The science of describing the important aspects of a data set measurements. DOES NOT allow us to draw any conclusions or make any interference about the data.
Inferential Statistics or Statistical Inference Set of methods, but it is used to draw conclusions or inferences about characteristics of populations based on data from a sample. The process of making an estimate, prediction or decision about a population based on a sample.
Statistical Inference The science of drawing conclusion/inference about a population from a sample
Bar Chart, Pie Chart, Pareto Chart, Histogram A form of Descriptive Statistics using Graphical Techniques. Allows statistics practitioners to present data in ways that make it easy for the reader to extract useful information.
Mean, Median Popular numerical techniques in descriptive statistics to describe the location of the data.
Range, Variance, Standard Deviation Numerical technique in descriptive statistics to measure the variability of the data.
Business analytics The use of traditional and newly developed statistical methods, advances in IS, and techniques from management science to explore and investigate past performance Descriptive analytics, Predictive analytics, Prescriptive analytics
Big data Often needs quick analysis to support business decision making.
Descriptive modeling Which typically uses data aggregation to provide hindsight and insight into the past and strives to answer: “What has happened?” Predictive modeling
Descriptive analytics The use of traditional and newer graphics to represent easy-to-understand visual summaries of up-to-the-minute data Dot plots, Time series plots, Bar chart, Histograms, Dashboards, Numerical techniques
Predictive analytics Methods used to find anomalies, patterns, and associations in data sets to predict future outcomes Linear regression, Logistic regression, Decision trees, Neural networks, Cluster analysis, Factor analysis, Association Rules
Data mining The use of predictive analytics, algorithms, and IS techniques to extract useful knowledge from huge amounts of data K-Means algorithm, Support Vector Machines, Bayesian Belief Network,
Prescriptive analytics Looks at variables and constraints, along with predictions from predictive analytics, to recommend courses of action Optimization sub-routine, Liner programming, Non-linear programming, Dynamic programming, Integer programming, Simulation
Nominal A qualitative variable of description for which there is no meaningful ordering, or ranking, of the categories Example: gender, car color Only limited statistical techniques are applicable
Ordinal A qualitative variable for which there is a meaningful ordering, or ranking, of the categories Example: teaching effectiveness, choice of preference Only limited statistical techniques are applicable
Interval Variables Real numbers, i.e. heights, weights, prices, etc. Also referred to as quantitative or numerical data. Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on.
Qualitative Variables Nominal and Ordinal. The possible measurements are descriptive and not numbers.
Graphical Descriptive Technique for Nominal/Ordinal (Qualitative) Data Frequency, Relative Frequency, Percentage (%) Frequency, Cumulative Relative Frequency (Ogive), Bar Chart, Pie-Chart, Pareto Analysis, Contingency Table
Graphical Descriptive Technique for Interval (Quantitative) Data Frequency Table, Histogram, Ogive, Dot Plot, Stem-and-Leaf Plot, Scatterplot
frequency distribution We can summarize the data in a table that presents the categories and their counts called a frequency distribution.
relative frequency distribution Lists the categories and the proportion with which each occurs.
Frequency The number of items in each ‘class’ in the data
Relative frequency Summarizes the proportion of items in each class
Bar chart A vertical or horizontal rectangle represents the frequency for each category Height can be frequency, relative frequency, or percent frequency
Pie chart A circle divided into slices where the size of each slice represents its relative frequency or percent frequency
Pareto principle In many economies, most of the wealth is held by a small minority of the population (80% - 20% principle) Application: a few classes of defects accounts for most quality problems in manufacturing.
Development of Pareto Chart Develop Bar chart representing the frequency of occurrence Bars are arranged in decreasing height from left to right Chart is augmented by plotting a cumulative percentage point for each bar (Pareto Line)
Cross Classification Table Lists the Frequency of each combination of values for two variables as a first step. To describe the relationship between two nominal variables, we must remember that we are permitted only to determine the frequency of the values.
Contingency Tables Classifies data on two dimensions Rows classify according to one dimension Columns classify according to a second dimension
Frequency Distribution A frequency distribution is a list of data classes with the count of values that belong to each class The frequency distribution is a table
Histogram The histogram is a picture of the frequency distribution
K K is the number of classes. K = 1 + 3.3 Log10 (n)
n n is the number of elements within the sample.
N N is the number of elements in the entire population.
Length or Width of a class (Max - Min) / k
Skewed to the right The right tail of the histogram is longer than the left tail
Skewed to the left The left tail of the histogram is longer than the right tail
Symmetrical The right and left tails of the histogram appear to be mirror images of each other
Cumulative Distributions To do this, use the same number of classes, class lengths, and class boundaries used for the frequency distribution. Rather than a count, we record the number of measurements that are less than the upper boundary of that class. A running total
Ogive A graph of a cumulative distribution
Frequency Polygons Plot a point above each class midpoint at a height equal to the frequency of the class Useful when comparing two or more distributions
Dot Plots A Dot placed on a real number line to quickly show potential Useful for detecting outliers.
Stem-and-Leaf Displays Purpose is to see the overall pattern of the data, by grouping the data into classes the variation from class to class, the amount of data in each class, the dist of the data within each class, Best for small to moderately sized data distributions
Scatter Plots Used to study relationships between two variables Each data has two-dimensions Place one variable on the x-axis Place a second variable on the y-axis Place dot on pair coordinates
Linear A straight line relationship between the two variables
Linear Positive When one variable goes up, the other variable goes up
Linear Negative When one variable goes up, the other variable goes down
No linear relationship There is no coordinated linear movement between the two variables
Data Warehouses A process for centralized data management and retrieval and has as its ideal objective the creation and maintenance of a central repository for all of an organizations data.
Response variable vs factors When initiating a study, we first define our variable of interest, or response variable. Other variables, typically called factors, that may be related to the response variable.
Experimental Study Means we are able to set or manipulate the values of the factors.
Observational Study Means we are not able to control the factors.
Sample with or without replacement When sampling with replacement, the selection is place back into the population to potentially be selected again. Sampling without replacement only allows the selection to be chosen once because it is not placed back into the population.
Finite population vs infinite population Finite population, no more can be added (ex. Number of cars produced in a specific year). Infinite populations can potentially always have one more added (ex. All car models that could be produced, because in theory one more car could always be produced).
Statistical Model A Statistical model is a set of assumptions about how sample data are selected and about the population from which the sample data are selected. The assumptions concerning the sampled populations often specify the probability distributions.
Probability Distribution A theoretical equation, graph, or curve that can be used to find the probability or likelihood that a measurement or observation randomly selected from a population will equal a value or fall into a range of values.
Anomaly or Outlier A value or measure that is atypical or situated away from the general group or cluster of other values.
Measures of Central Tendency Mean, Median, Mode
Measures of Variability Range, Standard Deviation, Variance, Coefficient of Variation
Measures of Relative Standing Percentiles, Quartiles, Deciles
Measures of Linear Relationship (2 variables) Covariance, Correlation Coefficient, Coefficient of Determination, Least-Square Line
Ogive A graph of a cumulative distribution. Plot a point above each upper class boundary at height of cumulative frequency, Connect points with line segments
Mean The average or expected value Sum of observations -Divided by- Number of observations
Median The value of the middle point of the ordered measurements. One advantage the median holds is that it not as sensitive to extreme values as is the mean.
Mode The most frequent value
Greek letter “mu” Arithmetic mean for a population
"x-bar" Arithmetic mean for a sample
MMM for Symmetrical Curve All mode, mean, median are all at the same point which is the highest peak.
MMM for Skewed to the right Order goes towards the tail. Mode is the highest peak, median "which is the longest word of the three" is in the middle, then mean
MMM for Skewed to the left Order goes towards the tail. Mode is the highest peak, median "which is the longest word of the three" is in the middle, then mean
M little o subscript Mode
M little d subscript Median
Range Largest observation (minus) smallest observation
Variance Variance and its related measure, standard deviation, are arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures.
Population variance symbol Lower case Greek letter “sigma” squared. Looks like a cursive o squared
Sample variance symbol Lower case “s” squared
Population Standard Deviation Square root of population variance. Looks like a lowercase "o"
Sample Standard Deviation Square root of sample variance. Lowercase "s"
68% Approximately 68% of all observations fall within one standard deviation of the mean.
95% Approximately 95% of all observations fall within two standard deviations of the mean.
99.7% Approximately 99.7% of all observations fall within three standard deviations of the mean.
Percentile n'th% will be below it and n'th% will be above it. n'th percentile of a data set is a value such that n'th% of the data have values less that the percentile. 0th percentile. Min 100th percentile. Max.
Quartile By 25% Q1 is first quartile means 25th percentile. Less than equal to. Q2 is the second quartile means 50th percentile. Median. Q3 is the third quartile 75th percentile. Q4 is the fourth quartile 100th percentile.
Decile By 10% 1st decile - 10th percentile. Lower decile. 2nd decile - 20th percentile. 5th decile - 50th percentile. Median. 9th decile - 90th percentile. Upper decile.
Quantile By 20% 1st quantile is 20th percentile 2nd quantile is 40th percentile 3rd quantile is 60th percentile 4th quantile is 80th percentile 5th quantile is 100th
5 number summary Min Q1 Q2 Median Q3 Max
Inter quartile range (IRQ) Quartile3 - Quartile1 approximates the standard deviation
Box-whisker plot 5number summary Upper limit = q3 + 1.5 * IQR Lower limit = q1 - 1.5 * IQR Inner fence (left whisker) the next highest number in the dataset from the Lower limit Out fence (right whisker) the next lower number in the dataset from the higher limit
Inner fence (left whisker) the next highest number in the dataset from the Lower limit
Outter fence (right whisker) the next lower number in the dataset from the higher limit
Outliers data which are lower than the inner fence or greater than the outer fence.
Strata The sub-population in a stratified sampling design
Random Sample Population A sample selected in such a way that every set of n elements in the population has the same chance of being selected.
Created by: Mixt