Save
Upgrade to remove ads
Busy. Please wait.
Log in with Clever
or

show password
Forgot Password?

Don't have an account?  Sign up 
Sign up using Clever
or

Username is available taken
show password


Make sure to remember your password. If you forget it there is no way for StudyStack to send you a reset link. You would need to create a new account.
Your email address is only used to allow you to reset your password. See our Privacy Policy and Terms of Service.


Already a StudyStack user? Log In

Reset Password
Enter the associated with your account, and we'll email you a link to reset your password.
focusNode
Didn't know it?
click below
 
Knew it?
click below
Don't Know
Remaining cards (0)
Know
0:00
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

  Normal Size     Small Size show me how

STATISTICS REVIEW

TermDefinition
Statistics Etymology The word Statistics is derived from Latin word status meaning β€œstate”.
Statistics is a Science that deals with the collection, presentation, analysis and interpretation of data.
Collection refers to the gathering of information or data.
Organization or Presentation involves summarizing data in textual, graphical or tabular form.
Analysis involves describing the data by statistical methods or procedures.
Interpretation refers to the process of making conclusions based on the analyzed data
variable is a characteristic or attribute that can assume different values.
Data are the values (measurements or observations) that the variables can assume.
random variables Variables whose values are determined by chance
data set A collection of data values
data value or datum Each value in the data set
Descriptive statistics collection, organization, summarization, and presentation of data.
descriptive statistics the statistician tries to describe a situation.
Inferential statistics generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions.
Inferential statistics the statistician tries to make inferences from samples to populations
Inferential statistics uses probability
population consists of all subjects (human or otherwise) that are being studied.
sample is a group of subjects selected from a population.
parameter is a numerical summary or any measurement coming from a population.
statistic is a measurement from a sample.
Qualitative variables are variables that can be placed into distinct categories, according to some characteristic or attribute.
Quantitative Variables are numerical in nature.
Quantitative Variables These are obtained from counting(discrete) or measuring(continuous).
Quantitative Variables meaningful arithmetic operations can be done with these kinds of data.
Dependent Variable a variable, which is affected or influenced by another variable.
Independent Variable one that affects, or influences another variable
nominal scale the categories of a qualitative variable are unordered
nominal scale is used when we want to distinguish one object from another for identification purposes. We can only say that one object is different from another, but the amount if difference cannot be determined
ordinal scale the categories of a qualitative variable can be put in order
ordinal scale data are arranged in some specific order or rank. When objects are measured in this level, we can say that one is greater than the other, but we cannot tell how much more one has than the other.
interval scale one can compare the differences between measurements of the quantitative variable meaningfully, but not the ratio of the measurements
interval scale When data are measured not only that one is greater or less than the other, but can also specify the amount or difference
ratio scale one can compare both the differences between measurements of the quantitative variable and the ratio of the measurements meaningfully
ratio scale the level always starts from the absolute or true zero point.
Data consists of information coming from observations, counts, measurements, or responses
Primary Collected specifically for the analysis desired. Most common type is doing a survey
Secondary have already been collected/compiled and are available for statistical analysis
Survey Study systematic method for gathering information. It is an investigation of one or more characteristics of a population
The Direct or Interview Method In this method, the researcher has a direct contact with the interviewee.
The Direct or Interview Method The researcher obtains the information needed by asking questions and inquiries from the interviewee
The Direct or Interview Method This method gives precise and consistent information because clarifications can be made.
The Direct or Interview Method this method is time consuming, expensive, and has limited field coverage.
The Indirect or Questionnaire Method This method makes used of a written questionnaire.
The Indirect or Questionnaire Method The researcher distributes the questionnaire to the respondents either by personal delivery or by mail.
The Indirect or Questionnaire Method Using this method, the researcher can save a lot of time and money in gathering the information needed because questionnaires can be given to a large number of respondents at the same time.
The Indirect or Questionnaire Method the researcher cannot expect that all distributed questionnaires will be retrieved because some respondents simply ignore the questionnaires.
The Indirect or Questionnaire Method In addition, clarification cannot be made if the respondent does not understand the question.
The Registration Method This method of collecting data is governed by laws.
Retrospective Study Uses either all or sample data and can also be called as Historical Data.
Retrospective Study Quickest and easiest way to collect process data.
Retrospective Study Provides limited information.
a type of primary data The data recorded or internal data by a company such as sales and transactions
Primary data is data that is collected directly from the source for a specific purpose or research question, and it has not been previously collected or analyzed by others.
Observational Study Simply observes the process or population during a period of routine operation
Observational Study Researcher interacts/disturbs the process only as much as is required to obtain data on the system.
Observational Study May give valuable info but usually limited because you just altered a part of the system
The Experimental Design This method is usually used to find out cause and effect relationships.
The Experimental Design Scientific researchers often use this method.
The Experimental Design We can establish cause-and-effect relationship unlike retrospective and observational studies where we are just informed about any interesting phenomena.
Simulation Study Cost-effective, time efficient, safe-testing, increased understanding, optimization
Simulation data gathering refers to the process of collecting data from a simulation, which is a computer model of a system that mimics its behavior.
Simulation is a powerful technique and can be used to model many different types of systems.
Requirements of a Good Sample a β€œscaled-down” version of the population, mirroring every characteristic of the whole population.
Observation Unit/element basic unit of observation, an object which a measurement is taken.
Target Population the complete collection of observations we want to study.
Sample subset of a population.
Sampled Population the collection of all possible observation units that might have been chosen in a sample.
Sampling unit a unit that can be selected for a sample.
Sampling frame A list, map, or other specification of sampling units in the population from which a sample may be selected.
Selection Bias It occurs when some part of the target population is not in the sampled population, or, more generally, when some population units are sampled at a different rate than intended by the investigator
A good sample will be as free from selection bias as possible, has accurate responses to the items of interest
Measurement Error When a response in the survey differs from the true value
Sampling error the error that results from taking one sample instead of examining the whole population.
Non-sampling error selection bias and measurement error are types of non-sampling error.
Non-sampling error These are the errors that cannot be attributed to the sample-to-sample variability.
Simple probability samples each unit in the population has a known probability of selection
Simple probability samples random number table or other randomization mechanism is used to choose the specific units to be included in the sample.
Simple probability samples Investigator can use a relatively small sample to make inferences about an arbitrarily large population
Simple random sampling The most basic form of probability sampling and provides theoretical basis for the more complicated forms.
Simple Random Sample with Replacement (SRSWR) One unit is randomly selected from the population to be the first sampled unit, with probability 1/N. (might include duplicates)
Simple Random Sample without Replacement (SRSWOR) is much more preferred than Simple Random Sample with Replacement (SRSWR)
Simple Random Sample without Replacement (SRSWOR) This sample is selected so that every possible subset of n distinct units in the population has the same probability of being selected as the sample.
Systematic sampling It is used as a proxy for simple random sampling when no list of the population.
Systematic sampling Selection of individuals is based on pre-determined interval (k) or sampling interval and we choose a random starting point.
Stratified random sampling To stratify a population means to classify or to separate people into groups according to some of their characteristics, such as rank, income, education, sex, or ethnicity background.
strata It partitions population into subclasses with notable distinctions
Cluster sampling is similar to stratified random sampling, the total population is divided into clusters and a sample random sampling is used in each cluster.
Cluster is usually based on geographic area.
Non-probability sampling is a method of selecting sampling units from a target population using a subjective or non-random method.
Convenience sampling The sample is selected based on accessibility or convenience.
Convenience sampling It is the least effective of the non-probability sampling methods but if there are logistics and time constraints, it may be the only option.
Purposive Sampling A method for obtaining sample units where researchers use their expertise to choose qualified participants to take the survey that will help the research study meets its goals.
Purposive Sampling The researchers pick these participants purposively.
Quota sampling The sample is selected based on certain quotas or predetermined criteria, such as age, educational attainment, gender or income level.
Quota sampling is one of the most preferred methods of non-probability sampling because it forces the inclusion of members of different subpopulations.
Snowball sampling The sample is selected based on referrals from other members of the population.
Snowball sampling This type of sampling is used if the population of interest is hard to find like people with disabilities or certain diseases, drug users, victims of a specific crime.
raw data Information obtained by observing values of a variable
qualitative data Data obtained by observing values of a qualitative variable
quantitative data Data obtained by observing values of a quantitative variable
discrete data Quantitative data obtained from a discrete variable
continuous data quantitative data obtained from a continuous variable
Ungrouped data are data that are not organized, or if arranged, could only be from highest to lowest or lowest to highest
Grouped data are data that are organized and arranged into different classes or categories.
Tabular method By organizing the data in tables, important features about the data can be readily understood and comparisons are easily made.
Table Heading consists of the table number and the title
Column Header It describes the data in each column.
Row Classifier It shows the classes or categories.
Body This is the main part of the table.
Source Note This is placed below the table when the data written are not original.
Frequency Distribution Table The most commonly used method in presenting data by tabular method
frequency distribution is the organization of raw data in table form, using classes and frequencies
Frequency Distribution Table (FDT) is a statistical table showing the frequency or number of observations contained in each of the defined classes or categories.
frequency distribution for qualitative data lists all categories and the number of elements that belong to each of the categories
relative frequency is obtained by dividing the frequency(𝑓) for a category by the sum of all the frequencies(𝑛)
relative frequency They are commonly expressed as percentages
Class Limits endpoints of a class interval
Upper Class Limit represents the largest data value that can be included in the class.
Lower Class Limit represents the smallest data value that can be included in the class.
Class boundaries used to separate the classes so that there are no gaps in the frequency distribution
Lower boundary Lower Limit – 0.5
Upper boundary Upper limit + 0.5
Class width (i) the difference between the boundaries for any class., i.e. i=upper boundary – lower boundary or i=(upper limit-lower limit) +1
Class mark the midpoint of the class
less than cumulative frequency (<cf) total number of observations less than the upper boundary of a class interval
greater than cumulative frequency (>cf) total number of observations greater than the lower boundary of a class interval
Graphical Method The purpose of graphs in statistics is to convey the data to the viewers in pictorial form.
Graphical Method It is easier for most people to comprehend the meaning of data presented graphically than data presented numerically in tables or frequency distributions.
Bar Graph is a graph composed of bars whose heights are the frequencies of the different categories.
Bar Graph displays graphically the same information concerning qualitative data that a frequency distribution shows in tabular form.
Pie Chart is also used to graphically display qualitative data
Histogram is a graph that displays the data by using contiguous vertical bars (unless the frequency of a class is 0) of various heights to represent the frequencies of the classes.
Frequency Polygon is a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes.
Ogive is a graph that represents the cumulative frequencies for the classes in a frequency distribution.
Ogive is a graph in which a point is plotted above each class boundary at a height equal to the cumulative frequency corresponding to that boundary.
measure of central tendency gives a single value that acts as a representative or average of the values of all the outcomes of your data set.
central tendency is a statistical measure that determines a single value that accurately describes the center of the distribution and represents the entire distribution of scores/measures/.
goal of central tendency is to identify the single value that is the best representative for the entire set of data.
measure of central tendency Any measure indicating the center of a set of data, arranged in an increasing or decreasing order of magnitude
mean, median, and mode most commonly used measures
Mean is the most commonly used measure of central tendency.
Computation of the mean requires scores that are numerical values measured on an interval or ratio scale.
Mean is obtained by computing the sum, or total, for the entire set of scores(data), then dividing this sum by the number of scores.
Median is defined as the midpoint of the list.
Median divides the scores so that 50% of the scores in the distribution have values that are equal to or less than the median
Median Computation of the median requires scores that can be placed in rank order (smallest to largest) and are measured on an ordinal, interval, or ratio scale
Mode is defined as the most frequently occurring category or score in the distribution.
Mode is the category or score corresponding to the peak or high point of the distribution.
Mode is not unique. Some data set can have more than one of it
Measures of Variability or Dispersion are measures of the average distance of each observation from the center of the distribution.
Measures of Variability or Dispersion tell us how spread out the scores are
Measures of Variability or Dispersion They summarize and Describe the extent to which scores in a distribution differ from each other.
Measures of absolute dispersion Range, Variance, Standard Deviation
Measures of relative dispersion coefficient of variation
Measures of Absolute Dispersion are expressed in the units of the original observations.
Measures of Absolute Dispersion They cannot be used to compare variations of two data sets when the averages of these sets differ a lot in value or when the observations differ in units of measurements.
Range is the difference between the highest and the lowest values.
Range This is the simplest but most unreliable measure of dispersion since it only uses two values in the distribution
Variance is the average of the squared deviation of each score from the mean
Standard Deviation is the square root of the average of the squared deviation of each score from the mean, or simply, the square root of the variance.
Measures of Relative Dispersion are unitless measures and are used when one wishes to compare the scatter of one distribution with another distribution.
Coefficient of Variation is the ratio of the standard deviation to the mean and is usually expressed in percentage
Coefficient of Variation It is used to compare variability of two or more sets of data even when they are expressed in different units of measurements.
fractiles or quantiles are values below which a specific fraction or percentage of the observations in a given set must fall.
Quartiles, Deciles and Percentiles fractiles of special interest
Measures of Location or Position several measures that describe or locate the position of non-central pieces of data relative to the entire set of data.
Standard z-Score measures how many standard deviation an observation is above or below the mean
Percentiles are values that divide a set of observations into 100 equal parts
Deciles are values that divide a set of observations into 10 equal parts
Quartiles are values that divide a set of observations into 4 equal parts
Measures of Shapes describe the shape of a certain distribution
Histogram can give a general idea of the shape
skewness and kurtosis two numerical measures of shape that can give a more precise evaluation
Skewness refers to the degree of symmetry and asymmetry of a distribution
normal distribution is bell-shaped and symmetric through the mean
normal distribution It has the property mean=median=mode
distribution skewed to the left the mean is less than the median
negatively skewed The bulk of the distribution is on the right
distribution skewed to the right the mean is greater than the median
positively skewed The bulk of the distribution is on the left
Kurtosis refers to the peakedness or flatness of a distribution
Mesokurtic is a normal distribution
Leptokurtic is more peaked than the normal distribution
Platykurtic is flatter than the normal distribution
experiment is the process of observing a phenomenon that has variation in its outcomes
experiment It is a well-defined action leading to a single, well-defined result
outcome is a result from a single trial of an experiment
Sample Space The set of all possible outcomes of an experiment
event is a collection of some outcome from an experiment
simple event An event containing only one element
compound event is one that can be expressed as a union of simple events.
null space or empty space is a subset of the sample space that contains no elements
null space or empty space It is denoted by βˆ…
union of two events A and B denoted by A βˆͺ B
union of two events A and B is the event containing all elements that belong to both A or to B, or to both
complement of an event A is the set of all elements of S that are not in A
intersection of two events A and B denoted by A ∩ B
intersection of two events A and B is the event containing all elements common to both A and B
A and B have no elements in common Two events A and B are mutually exclusive if 𝐴 ∩ 𝐡 = βˆ…
Multiplication Rule If an operation can be performed in 𝑛1 ways, and if for each of these, a second operation can be performed in 𝑛2 ways, then the two operations can be performed together in 𝑛1𝑛2 ways
Generalized Multiplication Rule If an operation can be performed in 𝑛1 ways. If for each of these, a second operation can be performed in 𝑛2 ways. If for each of the first two a third operation can be performed in 𝑛3 ways, and so forth
Permutation is an ordered arrangement of all or part of a set of objects.
Combination is an arrangement of objects without regard to order
Probability refers to the likelihood of occurrence of an event
Subjective Probability chance of occurrence is given by a particular person based on his/her educated guess, opinion, intuition or beliefs
Empirical Probability probability is assigned based on the prior knowledge of the events that happened on the past, or based on research or experiment
Classical Probability applied when all possible outcomes are equally likely to happen
Conditional Probability (that B occurs given A has occurred) The Probability of an event B occurring when it is known that some event A has occurred
Independent Events A occurring does not affect the probability of B occurring
Random Variable is a variable whose values are determined by chance
Discrete Random Variables have a finite number of possible values or an infinite number of values that can be counted
Continuous Random Variables variables that can assume all values in the interval between any two given values
discrete probability distribution consists of the values a random variable can assume and the corresponding probabilities of the values
discrete probability distribution The probabilities are determined theoretically or by observation.
probability mass function The probability distribution of a discrete random variable
family of probability distributions The collection of all probability distributions for different values of the parameter
parameter of the distribution a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution
cumulative distribution function 𝐹(π‘₯) of a discrete random variable 𝑋 with pmf 𝑝(π‘₯) is defined for every number π‘₯ by 𝐹(π‘₯) = 𝑃(𝑋 ≀ π‘₯) = B 𝑝(𝑦)
mean or expected value of a probability distribution is the theoretical average of the variable
binomial distribution The outcomes of a binomial experiment and the corresponding probabilities of these outcomes
Poisson distribution A discrete probability distribution that is useful when n is large, and p is small and when the independent variables occur over a period of time
normal distribution curve is bell-shaped
mean, median, and mode are equal and are located at the center of the distribution
normal distribution curve is unimodal (i.e., it has only one mode)
curve is symmetric about the mean, which is equivalent to saying that its shape is the same on both sides of a vertical line passing through the center
curve never touches the x axis. Theoretically, no matter how far in either direction the curve extends, it never meets the x axisβ€”but it gets increasingly closer
total area under a normal distribution curve is equal to 1.00, or 100%
standard normal distribution is a normal distribution with a mean of 0 and a standard deviation of 1
area under a normal distribution curve is used to solve practical application problems
hypothesis testing define the population under study, state da particular hypotheses dat will be investigated, give da signif. level, select a sample from da popul., collect da data, perform da calculations required 4 da stat. test, & reach a conclusion
statistical hypothesis is a conjecture about a population parameter. This conjecture may or may not be true.
null hypothesis symbolized by 𝐻0
null hypothesis is a statistical hypothesis that states that there is no difference between a parameter and a specific value, or that there is no difference between two parameters
alternative hypothesis symbolized by 𝐻1 π‘œπ‘Ÿ 𝐻a
alternative hypothesis is a statistical hypothesis that states the existence of a difference between a parameter and a specific value, or states that there is a difference between two parameters
statistical test uses the data obtained from a sample to make a decision about whether the null hypothesis should be rejected
test value numerical value obtained from a statistical test
type-I error occurs if you reject the null hypothesis when it is true
type-II error occurs if you do not reject the null hypothesis when it is false
level of significance is the maximum probability of committing a type I error
level of significance This probability is symbolized by a (Greek letter alpha). That is, 𝑃 𝑑𝑦𝑝𝑒 𝐼 π‘’π‘Ÿπ‘Ÿπ‘œπ‘Ÿ = 𝛼.
critical value separates the critical region from the noncritical region
critical or rejection region is the range of values of the test value that indicates that there is a significant difference and that the null hypothesis should be rejected
noncritical or nonrejection region is the range of values of the test value that indicates that the difference was probably due to chance and that the null hypothesis should not be rejected
one-tailed test indicates that the null hypothesis should be rejected when the test value is in the critical region on one side of the mean
one-tailed test is either a right-tailed test or left-tailed test, depending on the direction of the inequality of the alternative hypothesis
two-tailed test the null hypothesis should be rejected when the test value is in either of the two critical regions
z test is a statistical test for the mean of a population. It can be used when 𝑛 β‰₯ 30, or when the population is normally distributed and s is known
t test When the population standard deviation is unknown, t test is used. The distribution of the variable should be approximately normal.
t distribution is similar to the standard normal distribution
t distribution is bell-shaped
t distribution is symmetric about the mean
t distribution The mean, median, and mode are equal to 0 and are located at the center of the distribution
t distribution The curve never touches the x axis
t distribution The variance is greater than 1
t distribution is a family of curves based on the degrees of freedom, which is a number related to sample size
t distribution As the sample size increases, the t distribution approaches the normal distribution
t test is a statistical test for the mean of a population
t test is used when the population is normally or approximately normally distributed, 𝜎 is unknown, or when the sample is small, i.e 𝑛 < 30
Created by: reeper
 

 



Voices

Use these flashcards to help memorize information. Look at the large card and try to recall what is on the other side. Then click the card to flip it. If you knew the answer, click the green Know box. Otherwise, click the red Don't know box.

When you've placed seven or more cards in the Don't know box, click "retry" to try those cards again.

If you've accidentally put the card in the wrong box, just click on the card to take it out of the box.

You can also use your keyboard to move the cards as follows:

If you are logged in to your account, this website will remember which cards you know and don't know so that they are in the same box the next time you log in.

When you need a break, try one of the other activities listed below the flashcards like Matching, Snowman, or Hungry Bug. Although it may feel like you're playing a game, your brain is still making more connections with the information to help you out.

To see how well you know the information, try the Quiz or Test activity.

Pass complete!
"Know" box contains:
Time elapsed:
Retries:
restart all cards