The word Statistics is derived from Latin word status meaning “state”.

is a Science that deals with the collection, presentation, analysis and interpretation of data.

refers to the gathering of information or data.

Organization or Presentation

involves summarizing data in textual, graphical or tabular form.

involves describing the data by statistical methods or procedures.

refers to the process of making conclusions based on the analyzed data

is a characteristic or attribute that can assume different values.

are the values (measurements or observations) that the variables can assume.

Variables whose values are determined by chance

Descriptive statistics

collection, organization, summarization, and presentation of data.

descriptive statistics

the statistician tries to describe a situation.

Inferential statistics

generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions.

Inferential statistics

the statistician tries to make inferences from samples to populations

consists of all subjects (human or otherwise) that are being studied.

is a group of subjects selected from a population.

is a numerical summary or any measurement coming from a population.

is a measurement from a sample.

Qualitative variables

are variables that can be placed into distinct categories, according to some characteristic or attribute.

Quantitative Variables

These are obtained from counting(discrete) or measuring(continuous).

Quantitative Variables

meaningful arithmetic operations can be done with these kinds of data.

a variable, which is affected or influenced by another variable.

one that affects, or influences another variable

the categories of a qualitative variable are unordered

is used when we want to distinguish one object from another for identification purposes. We can only say that one object is different from another, but the amount if difference cannot be determined

the categories of a qualitative variable can be put in order

data are arranged in some specific order or rank. When objects are measured in this level, we can say that one is greater than the other, but we cannot tell how much more one has than the other.

one can compare the differences between measurements of the quantitative variable meaningfully, but not the ratio of the measurements

When data are measured not only that one is greater or less than the other, but can also specify the amount or difference

one can compare both the differences between measurements of the quantitative variable and the ratio of the measurements meaningfully

the level always starts from the absolute or true zero point.

consists of information coming from observations, counts, measurements, or responses

Collected specifically for the analysis desired. Most common type is doing a survey

have already been collected/compiled and are available for statistical analysis

systematic method for gathering information. It is an investigation of one or more characteristics of a population

The Direct or Interview Method

In this method, the researcher has a direct contact with the interviewee.

The Direct or Interview Method

The researcher obtains the information needed by asking questions and inquiries from the interviewee

The Direct or Interview Method

This method gives precise and consistent information because clarifications can be made.

The Direct or Interview Method

this method is time consuming, expensive, and has limited field coverage.

The Indirect or Questionnaire Method

This method makes used of a written questionnaire.

The Indirect or Questionnaire Method

The researcher distributes the questionnaire to the respondents either by personal delivery or by mail.

The Indirect or Questionnaire Method

Using this method, the researcher can save a lot of time and money in gathering the information needed because questionnaires can be given to a large number of respondents at the same time.

The Indirect or Questionnaire Method

the researcher cannot expect that all distributed questionnaires will be retrieved because some respondents simply ignore the questionnaires.

The Indirect or Questionnaire Method

In addition, clarification cannot be made if the respondent does not understand the question.

The Registration Method

This method of collecting data is governed by laws.

Uses either all or sample data and can also be called as Historical Data.

Quickest and easiest way to collect process data.

Help

Options

focusNode

Didn't know it?
click below

Knew it?
click below

Don't Know

Remaining cards (0)

Know

retry

shuffle

restart

0:00

Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

STATISTICS REVIEW

Term	Definition
Statistics Etymology	The word Statistics is derived from Latin word status meaning “state”.
Statistics	is a Science that deals with the collection, presentation, analysis and interpretation of data.
Collection	refers to the gathering of information or data.
Organization or Presentation	involves summarizing data in textual, graphical or tabular form.
Analysis	involves describing the data by statistical methods or procedures.
Interpretation	refers to the process of making conclusions based on the analyzed data
variable	is a characteristic or attribute that can assume different values.
Data	are the values (measurements or observations) that the variables can assume.
random variables	Variables whose values are determined by chance
data set	A collection of data values
data value or datum	Each value in the data set
Descriptive statistics	collection, organization, summarization, and presentation of data.
descriptive statistics	the statistician tries to describe a situation.
Inferential statistics	generalizing from samples to populations, performing estimations and hypothesis tests, determining relationships among variables, and making predictions.
Inferential statistics	the statistician tries to make inferences from samples to populations
Inferential statistics	uses probability
population	consists of all subjects (human or otherwise) that are being studied.
sample	is a group of subjects selected from a population.
parameter	is a numerical summary or any measurement coming from a population.
statistic	is a measurement from a sample.
Qualitative variables	are variables that can be placed into distinct categories, according to some characteristic or attribute.
Quantitative Variables	are numerical in nature.
Quantitative Variables	These are obtained from counting(discrete) or measuring(continuous).
Quantitative Variables	meaningful arithmetic operations can be done with these kinds of data.
Dependent Variable	a variable, which is affected or influenced by another variable.
Independent Variable	one that affects, or influences another variable
nominal scale	the categories of a qualitative variable are unordered
nominal scale	is used when we want to distinguish one object from another for identification purposes. We can only say that one object is different from another, but the amount if difference cannot be determined
ordinal scale	the categories of a qualitative variable can be put in order
ordinal scale	data are arranged in some specific order or rank. When objects are measured in this level, we can say that one is greater than the other, but we cannot tell how much more one has than the other.
interval scale	one can compare the differences between measurements of the quantitative variable meaningfully, but not the ratio of the measurements
interval scale	When data are measured not only that one is greater or less than the other, but can also specify the amount or difference
ratio scale	one can compare both the differences between measurements of the quantitative variable and the ratio of the measurements meaningfully
ratio scale	the level always starts from the absolute or true zero point.
Data	consists of information coming from observations, counts, measurements, or responses
Primary	Collected specifically for the analysis desired. Most common type is doing a survey
Secondary	have already been collected/compiled and are available for statistical analysis
Survey Study	systematic method for gathering information. It is an investigation of one or more characteristics of a population
The Direct or Interview Method	In this method, the researcher has a direct contact with the interviewee.
The Direct or Interview Method	The researcher obtains the information needed by asking questions and inquiries from the interviewee
The Direct or Interview Method	This method gives precise and consistent information because clarifications can be made.
The Direct or Interview Method	this method is time consuming, expensive, and has limited field coverage.
The Indirect or Questionnaire Method	This method makes used of a written questionnaire.
The Indirect or Questionnaire Method	The researcher distributes the questionnaire to the respondents either by personal delivery or by mail.
The Indirect or Questionnaire Method	Using this method, the researcher can save a lot of time and money in gathering the information needed because questionnaires can be given to a large number of respondents at the same time.
The Indirect or Questionnaire Method	the researcher cannot expect that all distributed questionnaires will be retrieved because some respondents simply ignore the questionnaires.
The Indirect or Questionnaire Method	In addition, clarification cannot be made if the respondent does not understand the question.
The Registration Method	This method of collecting data is governed by laws.
Retrospective Study	Uses either all or sample data and can also be called as Historical Data.
Retrospective Study	Quickest and easiest way to collect process data.
Retrospective Study	Provides limited information.
a type of primary data	The data recorded or internal data by a company such as sales and transactions
Primary data	is data that is collected directly from the source for a specific purpose or research question, and it has not been previously collected or analyzed by others.
Observational Study	Simply observes the process or population during a period of routine operation
Observational Study	Researcher interacts/disturbs the process only as much as is required to obtain data on the system.
Observational Study	May give valuable info but usually limited because you just altered a part of the system
The Experimental Design	This method is usually used to find out cause and effect relationships.
The Experimental Design	Scientific researchers often use this method.
The Experimental Design	We can establish cause-and-effect relationship unlike retrospective and observational studies where we are just informed about any interesting phenomena.
Simulation Study	Cost-effective, time efficient, safe-testing, increased understanding, optimization
Simulation data gathering	refers to the process of collecting data from a simulation, which is a computer model of a system that mimics its behavior.
Simulation	is a powerful technique and can be used to model many different types of systems.
Requirements of a Good Sample	a “scaled-down” version of the population, mirroring every characteristic of the whole population.
Observation Unit/element	basic unit of observation, an object which a measurement is taken.
Target Population	the complete collection of observations we want to study.
Sample	subset of a population.
Sampled Population	the collection of all possible observation units that might have been chosen in a sample.
Sampling unit	a unit that can be selected for a sample.
Sampling frame	A list, map, or other specification of sampling units in the population from which a sample may be selected.
Selection Bias	It occurs when some part of the target population is not in the sampled population, or, more generally, when some population units are sampled at a different rate than intended by the investigator
A good sample	will be as free from selection bias as possible, has accurate responses to the items of interest
Measurement Error	When a response in the survey differs from the true value
Sampling error	the error that results from taking one sample instead of examining the whole population.
Non-sampling error	selection bias and measurement error are types of non-sampling error.
Non-sampling error	These are the errors that cannot be attributed to the sample-to-sample variability.
Simple probability samples	each unit in the population has a known probability of selection
Simple probability samples	random number table or other randomization mechanism is used to choose the specific units to be included in the sample.
Simple probability samples	Investigator can use a relatively small sample to make inferences about an arbitrarily large population
Simple random sampling	The most basic form of probability sampling and provides theoretical basis for the more complicated forms.
Simple Random Sample with Replacement (SRSWR)	One unit is randomly selected from the population to be the first sampled unit, with probability 1/N. (might include duplicates)
Simple Random Sample without Replacement (SRSWOR)	is much more preferred than Simple Random Sample with Replacement (SRSWR)
Simple Random Sample without Replacement (SRSWOR)	This sample is selected so that every possible subset of n distinct units in the population has the same probability of being selected as the sample.
Systematic sampling	It is used as a proxy for simple random sampling when no list of the population.
Systematic sampling	Selection of individuals is based on pre-determined interval (k) or sampling interval and we choose a random starting point.
Stratified random sampling	To stratify a population means to classify or to separate people into groups according to some of their characteristics, such as rank, income, education, sex, or ethnicity background.
strata	It partitions population into subclasses with notable distinctions
Cluster sampling	is similar to stratified random sampling, the total population is divided into clusters and a sample random sampling is used in each cluster.
Cluster	is usually based on geographic area.
Non-probability sampling	is a method of selecting sampling units from a target population using a subjective or non-random method.
Convenience sampling	The sample is selected based on accessibility or convenience.
Convenience sampling	It is the least effective of the non-probability sampling methods but if there are logistics and time constraints, it may be the only option.
Purposive Sampling	A method for obtaining sample units where researchers use their expertise to choose qualified participants to take the survey that will help the research study meets its goals.
Purposive Sampling	The researchers pick these participants purposively.
Quota sampling	The sample is selected based on certain quotas or predetermined criteria, such as age, educational attainment, gender or income level.
Quota sampling	is one of the most preferred methods of non-probability sampling because it forces the inclusion of members of different subpopulations.
Snowball sampling	The sample is selected based on referrals from other members of the population.
Snowball sampling	This type of sampling is used if the population of interest is hard to find like people with disabilities or certain diseases, drug users, victims of a specific crime.
raw data	Information obtained by observing values of a variable
qualitative data	Data obtained by observing values of a qualitative variable
quantitative data	Data obtained by observing values of a quantitative variable
discrete data	Quantitative data obtained from a discrete variable
continuous data	quantitative data obtained from a continuous variable
Ungrouped data	are data that are not organized, or if arranged, could only be from highest to lowest or lowest to highest
Grouped data	are data that are organized and arranged into different classes or categories.
Tabular method	By organizing the data in tables, important features about the data can be readily understood and comparisons are easily made.
Table Heading	consists of the table number and the title
Column Header	It describes the data in each column.
Row Classifier	It shows the classes or categories.
Body	This is the main part of the table.
Source Note	This is placed below the table when the data written are not original.
Frequency Distribution Table	The most commonly used method in presenting data by tabular method
frequency distribution	is the organization of raw data in table form, using classes and frequencies
Frequency Distribution Table (FDT)	is a statistical table showing the frequency or number of observations contained in each of the defined classes or categories.
frequency distribution for qualitative data	lists all categories and the number of elements that belong to each of the categories
relative frequency	is obtained by dividing the frequency(𝑓) for a category by the sum of all the frequencies(𝑛)
relative frequency	They are commonly expressed as percentages
Class Limits	endpoints of a class interval
Upper Class Limit	represents the largest data value that can be included in the class.
Lower Class Limit	represents the smallest data value that can be included in the class.
Class boundaries	used to separate the classes so that there are no gaps in the frequency distribution
Lower boundary	Lower Limit – 0.5
Upper boundary	Upper limit + 0.5
Class width (i)	the difference between the boundaries for any class., i.e. i=upper boundary – lower boundary or i=(upper limit-lower limit) +1
Class mark	the midpoint of the class
less than cumulative frequency (<cf)	total number of observations less than the upper boundary of a class interval
greater than cumulative frequency (>cf)	total number of observations greater than the lower boundary of a class interval
Graphical Method	The purpose of graphs in statistics is to convey the data to the viewers in pictorial form.
Graphical Method	It is easier for most people to comprehend the meaning of data presented graphically than data presented numerically in tables or frequency distributions.
Bar Graph	is a graph composed of bars whose heights are the frequencies of the different categories.
Bar Graph	displays graphically the same information concerning qualitative data that a frequency distribution shows in tabular form.
Pie Chart	is also used to graphically display qualitative data
Histogram	is a graph that displays the data by using contiguous vertical bars (unless the frequency of a class is 0) of various heights to represent the frequencies of the classes.
Frequency Polygon	is a graph that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes.
Ogive	is a graph that represents the cumulative frequencies for the classes in a frequency distribution.
Ogive	is a graph in which a point is plotted above each class boundary at a height equal to the cumulative frequency corresponding to that boundary.
measure of central tendency	gives a single value that acts as a representative or average of the values of all the outcomes of your data set.
central tendency	is a statistical measure that determines a single value that accurately describes the center of the distribution and represents the entire distribution of scores/measures/.
goal of central tendency	is to identify the single value that is the best representative for the entire set of data.
measure of central tendency	Any measure indicating the center of a set of data, arranged in an increasing or decreasing order of magnitude
mean, median, and mode	most commonly used measures
Mean	is the most commonly used measure of central tendency.
Computation of the mean	requires scores that are numerical values measured on an interval or ratio scale.
Mean	is obtained by computing the sum, or total, for the entire set of scores(data), then dividing this sum by the number of scores.
Median	is defined as the midpoint of the list.
Median	divides the scores so that 50% of the scores in the distribution have values that are equal to or less than the median
Median	Computation of the median requires scores that can be placed in rank order (smallest to largest) and are measured on an ordinal, interval, or ratio scale
Mode	is defined as the most frequently occurring category or score in the distribution.
Mode	is the category or score corresponding to the peak or high point of the distribution.
Mode	is not unique. Some data set can have more than one of it
Measures of Variability or Dispersion	are measures of the average distance of each observation from the center of the distribution.
Measures of Variability or Dispersion	tell us how spread out the scores are
Measures of Variability or Dispersion	They summarize and Describe the extent to which scores in a distribution differ from each other.
Measures of absolute dispersion	Range, Variance, Standard Deviation
Measures of relative dispersion	coefficient of variation
Measures of Absolute Dispersion	are expressed in the units of the original observations.
Measures of Absolute Dispersion	They cannot be used to compare variations of two data sets when the averages of these sets differ a lot in value or when the observations differ in units of measurements.
Range	is the difference between the highest and the lowest values.
Range	This is the simplest but most unreliable measure of dispersion since it only uses two values in the distribution
Variance	is the average of the squared deviation of each score from the mean
Standard Deviation	is the square root of the average of the squared deviation of each score from the mean, or simply, the square root of the variance.
Measures of Relative Dispersion	are unitless measures and are used when one wishes to compare the scatter of one distribution with another distribution.
Coefficient of Variation	is the ratio of the standard deviation to the mean and is usually expressed in percentage
Coefficient of Variation	It is used to compare variability of two or more sets of data even when they are expressed in different units of measurements.
fractiles or quantiles	are values below which a specific fraction or percentage of the observations in a given set must fall.
Quartiles, Deciles and Percentiles	fractiles of special interest
Measures of Location or Position	several measures that describe or locate the position of non-central pieces of data relative to the entire set of data.
Standard z-Score	measures how many standard deviation an observation is above or below the mean
Percentiles	are values that divide a set of observations into 100 equal parts
Deciles	are values that divide a set of observations into 10 equal parts
Quartiles	are values that divide a set of observations into 4 equal parts
Measures of Shapes	describe the shape of a certain distribution
Histogram	can give a general idea of the shape
skewness and kurtosis	two numerical measures of shape that can give a more precise evaluation
Skewness	refers to the degree of symmetry and asymmetry of a distribution
normal distribution	is bell-shaped and symmetric through the mean
normal distribution	It has the property mean=median=mode
distribution skewed to the left	the mean is less than the median
negatively skewed	The bulk of the distribution is on the right
distribution skewed to the right	the mean is greater than the median
positively skewed	The bulk of the distribution is on the left
Kurtosis	refers to the peakedness or flatness of a distribution
Mesokurtic	is a normal distribution
Leptokurtic	is more peaked than the normal distribution
Platykurtic	is flatter than the normal distribution
experiment	is the process of observing a phenomenon that has variation in its outcomes
experiment	It is a well-defined action leading to a single, well-defined result
outcome	is a result from a single trial of an experiment
Sample Space	The set of all possible outcomes of an experiment
event	is a collection of some outcome from an experiment
simple event	An event containing only one element
compound event	is one that can be expressed as a union of simple events.
null space or empty space	is a subset of the sample space that contains no elements
null space or empty space	It is denoted by ∅
union of two events A and B	denoted by A ∪ B
union of two events A and B	is the event containing all elements that belong to both A or to B, or to both
complement of an event A	is the set of all elements of S that are not in A
intersection of two events A and B	denoted by A ∩ B
intersection of two events A and B	is the event containing all elements common to both A and B
A and B have no elements in common	Two events A and B are mutually exclusive if 𝐴 ∩ 𝐵 = ∅
Multiplication Rule	If an operation can be performed in 𝑛1 ways, and if for each of these, a second operation can be performed in 𝑛2 ways, then the two operations can be performed together in 𝑛1𝑛2 ways
Generalized Multiplication Rule	If an operation can be performed in 𝑛1 ways. If for each of these, a second operation can be performed in 𝑛2 ways. If for each of the first two a third operation can be performed in 𝑛3 ways, and so forth
Permutation	is an ordered arrangement of all or part of a set of objects.
Combination	is an arrangement of objects without regard to order
Probability	refers to the likelihood of occurrence of an event
Subjective Probability	chance of occurrence is given by a particular person based on his/her educated guess, opinion, intuition or beliefs
Empirical Probability	probability is assigned based on the prior knowledge of the events that happened on the past, or based on research or experiment
Classical Probability	applied when all possible outcomes are equally likely to happen
Conditional Probability (that B occurs given A has occurred)	The Probability of an event B occurring when it is known that some event A has occurred
Independent Events	A occurring does not affect the probability of B occurring
Random Variable	is a variable whose values are determined by chance
Discrete Random Variables	have a finite number of possible values or an infinite number of values that can be counted
Continuous Random Variables	variables that can assume all values in the interval between any two given values
discrete probability distribution	consists of the values a random variable can assume and the corresponding probabilities of the values
discrete probability distribution	The probabilities are determined theoretically or by observation.
probability mass function	The probability distribution of a discrete random variable
family of probability distributions	The collection of all probability distributions for different values of the parameter
parameter of the distribution	a quantity that can be assigned any one of a number of possible values, with each different value determining a different probability distribution
cumulative distribution function	𝐹(𝑥) of a discrete random variable 𝑋 with pmf 𝑝(𝑥) is defined for every number 𝑥 by 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = B 𝑝(𝑦)
mean or expected value of a probability distribution	is the theoretical average of the variable
binomial distribution	The outcomes of a binomial experiment and the corresponding probabilities of these outcomes
Poisson distribution	A discrete probability distribution that is useful when n is large, and p is small and when the independent variables occur over a period of time
normal distribution curve	is bell-shaped
mean, median, and mode	are equal and are located at the center of the distribution
normal distribution curve	is unimodal (i.e., it has only one mode)
curve	is symmetric about the mean, which is equivalent to saying that its shape is the same on both sides of a vertical line passing through the center
curve	never touches the x axis. Theoretically, no matter how far in either direction the curve extends, it never meets the x axis—but it gets increasingly closer
total area under a normal distribution curve	is equal to 1.00, or 100%
standard normal distribution	is a normal distribution with a mean of 0 and a standard deviation of 1
area under a normal distribution curve	is used to solve practical application problems
hypothesis testing	define the population under study, state da particular hypotheses dat will be investigated, give da signif. level, select a sample from da popul., collect da data, perform da calculations required 4 da stat. test, & reach a conclusion
statistical hypothesis	is a conjecture about a population parameter. This conjecture may or may not be true.
null hypothesis	symbolized by 𝐻0
null hypothesis	is a statistical hypothesis that states that there is no difference between a parameter and a specific value, or that there is no difference between two parameters
alternative hypothesis	symbolized by 𝐻1 𝑜𝑟 𝐻a
alternative hypothesis	is a statistical hypothesis that states the existence of a difference between a parameter and a specific value, or states that there is a difference between two parameters
statistical test	uses the data obtained from a sample to make a decision about whether the null hypothesis should be rejected
test value	numerical value obtained from a statistical test
type-I error	occurs if you reject the null hypothesis when it is true
type-II error	occurs if you do not reject the null hypothesis when it is false
level of significance	is the maximum probability of committing a type I error
level of significance	This probability is symbolized by a (Greek letter alpha). That is, 𝑃 𝑡𝑦𝑝𝑒 𝐼 𝑒𝑟𝑟𝑜𝑟 = 𝛼.
critical value	separates the critical region from the noncritical region
critical or rejection region	is the range of values of the test value that indicates that there is a significant difference and that the null hypothesis should be rejected
noncritical or nonrejection region	is the range of values of the test value that indicates that the difference was probably due to chance and that the null hypothesis should not be rejected
one-tailed test	indicates that the null hypothesis should be rejected when the test value is in the critical region on one side of the mean
one-tailed test	is either a right-tailed test or left-tailed test, depending on the direction of the inequality of the alternative hypothesis
two-tailed test	the null hypothesis should be rejected when the test value is in either of the two critical regions
z test	is a statistical test for the mean of a population. It can be used when 𝑛 ≥ 30, or when the population is normally distributed and s is known
t test	When the population standard deviation is unknown, t test is used. The distribution of the variable should be approximately normal.
t distribution	is similar to the standard normal distribution
t distribution	is bell-shaped
t distribution	is symmetric about the mean
t distribution	The mean, median, and mode are equal to 0 and are located at the center of the distribution
t distribution	The curve never touches the x axis
t distribution	The variance is greater than 1
t distribution	is a family of curves based on the degrees of freedom, which is a number related to sample size
t distribution	As the sample size increases, the t distribution approaches the normal distribution
t test	is a statistical test for the mean of a population
t test	is used when the population is normally or approximately normally distributed, 𝜎 is unknown, or when the sample is small, i.e 𝑛 < 30

Created by: reeper

"Know" box contains:
Time elapsed:
Retries: