Statistics

Quiz yourself by thinking what should be in each of the black spaces below before clicking on it to display the answer.

Help!

Question

Answer

Flap 3

A population

is the set of all the possible items to be observed. example: Whilst investigating the height of males in Wales, the population would be the height of all the males in Wales.

🗑

Random sampling:

this method gives every item of the population an equal chance of selection. This can be done in various ways for example by simply picking out of a hat or by using a random number generator on a calculator.

🗑

Stratified sampling:

some populations are naturally split into a number of strata (kind of like sub groups). We can separate the strata and find what proportion of the population is in each stratum. We can then select a random sample from each stratum proportional to its size

🗑

measure of central tendency

is just a mathematical and rather posh way of saying "averages".

🗑

The Mode

It is the piece or pieces of data that occur most often.

🗑

The Median

The median is the middle piece of data when the data is in numerical order.->With 50 pieces of data, even, we must find halfway and the next value. In this case, the 25th and 26th values. The median will be halfway between these values.

🗑

The Mean

The mean of a set of data is the sum of all the values divided by the number of values. - Ex x=----- n

🗑

just means the sum of all the x’s - for instance, add all the bits of data together.

🗑

grouped frequency table->find MEAN

We can however, find an estimate of the mean by assuming each footballer is the height halfway within his interval

🗑

(f)

means frequency

🗑

"o- "definition

standart deviation->gives a measure of how the data is dispersed about the mean->the lower the standard deviation, the more compact our data is around the mean

🗑

Formula standart deviation

square root of ((the sum of x2 - ((mean of x)squared)) divided by the number of units

🗑

"o- 2" definition

The variance is the square of the standard deviation.

🗑

The variance

is the square of the standard deviation.

🗑

m-and-leaf diagram

presenting it in an easy and quick way to help spot patterns in the spread of data->They are best used when we have a relatively small set of data and want to find the median or quartiles

🗑

Box-and-whisker plots (or boxplots)

These are very basic diagrams used to highlight the quartiles and median to give a quick and clear way of presenting the spread of the data.

1.The ‘box’ part is drawn from the lower quartile to the upper quartile. The median is then drawn within the box. 2.The ‘box’ shows the inter-quartile range, which houses the middle half of the data. 3.The ‘whiskers’ are then drawn to the lowest and hig

🗑

Negatively skewed distribution:

There is a greater proportion of the data at the upper end.

(blank)

🗑

Positively skewed distribution:

There is a greater proportion of the data at the lower end.

(blank)

🗑

Outliers

Values of data are usually labelled as outliers if they are more than 1.5 times of the inter-quartile range from either quartile.

(blank)

🗑

Histograms

Histograms are best used for large sets of data, especially when the data has been grouped into classes. They look a little similar to bar charts or frequency diagrams. ->In histograms, the frequency of the data is shown by the area of the bars and not ju

(blank)

🗑

frequency density

The vertical axis of a histogram is labelled

frequency / class with

🗑

Cumulative frequency

is kind of like a running total. We add each frequency to the ones before to get an ‘at least’ total.

(blank)

🗑

cumulative frequency curve.

cumulative frequencies (‘at least’ totals) are plotted against the upper class boundaries to give us a cumulative frequency curve.

(blank)

🗑

P(A)

The probability that an event, A, will happen is written as

(blank)

🗑

complement of A

The probability that the event A, does not happen is called the complement of A and is written as A'

(blank)

🗑

mutually exclusive

Two events are mutually exclusive if the event of one happening excludes the other from happening->they both cannot happen simultaneously->When a fair die is rolled find the probability of rolling a 4 or a 1. P(4 u 1) = P(4) + P(1)=>1/6 +1/6=>1/3

Exclusive events will involve the words ‘or’, ‘either’ or something which implies ‘or’.->Remember ‘OR’ means ‘add’. P(A or B) = P(A) + P(B) P(A u B) = P(A) + P(B) P(A u B u C) = P(A) + P(B) + P(C)

🗑

Independent Events

Two events are independent if the occurrence of one happening does not affect the occurrence of the other.->P(A and B) = P(A) ' P(B) ->P(A n B) = P(A) ' P(B) Independent events will involve ‘and’, ‘both’,"either"->means multiply

A coin is flipped at the same time as a dice is rolled. Find the probability of obtaining a head and a 5.->P(H n 5)=P(H)'P(5)=> 1/2 x 1/6=> 1/12

🗑

How do you write Find the probability that given he falls P(F) it was a rainy day P(R).

P(R I F)

P(R n F) / P(F)

🗑

discrete random variable

A random variable is a variable which takes numerical values and whose value depends on the outcome of an experiment. It is discrete if it can only take certain values.

Capital letters are used to denote the random variables, whereas lower case letters are used to denote the values that can be obtained.

🗑

random variable

is a variable which takes numerical values and whose value depends on the outcome of an experiment

(blank)

🗑

exclusive events Rewrite -> Sum?

E P(X = x) = 1 -> always sum to 1

(blank)

🗑

Probability density function

Sometimes we are given a formula to calculate probabilities. We call this the probability density function of X or the p.d.f. of X.

(blank)

🗑

Cumulative distribution function

‘Cumulative’ gives us a kind of running total so a cumulative distribution function gives us a running total of probabilities within our probability table. The cumulative distribution function, F(x) of X is defined as: F(x) = P(X < x)

(blank)

🗑

Expectation

The expectation is the expected value of X, written as E(X) or sometimes as u->The expectation is what you would expect to get if you were to carry out the experiment a large number of times and calculate the ‘mean’..

E(X) = € xP(X = x) -> You multiply each value of x with its corresponding probability. If we then add all these up we obtain the expectation of X. This is best seen in an example.

🗑

uniform distribution

This is a ‘special’ discrete random variable as all the probabilities are the same.->it is possible to calculate the expectation by using the symmetry of the table. The expectation, E(X) is calculated by finding the halfway point.

(blank)

🗑

symmetry of the table

With uniform distributions it is possible to calculate the expectation by using the symmetry of the table. The expectation, E(X) is calculated by finding the halfway point.

(blank)

🗑

Expectation of any function of x

E[f(x)] = € f(x)P(X = x)

(blank)

🗑

E(aX + b) Equals

aE(X) + b

(blank)

🗑

E(a) Equals

(blank)

🗑

variance

is a measure of how spread out the values of X would be if the experiment leading to X were repeated a number of times.

(blank)

🗑

E(X)

-> mean -> u -> Example of Calculation->(0 x 0.1) + (1 x 0.2) + (2 x 0.5) + (3 x 0.2)

(blank)

🗑

Var(aX) Equals

a2Var(X)

Var(2X) = 22 x Var(X) = 4 x 2.5 = 10 Var(4X – 3) = 42 x Var(X) = 16 x 2.5 = 40

🗑

Var(aX + b) Equals

a2Var(X) This means by knowing just the variance, Var(X), we can calculate other variances quickly. Example:

Var(2X) = 22 x Var(X) = 4 x 2.5 = 10 Var(4X – 3) = 42 x Var(X) = 16 x 2.5 = 40

🗑

The Standard Deviation

The square root of the Variance is called the Standard Deviation of X. standard deviation is given the symbol o-

(blank)

🗑

convert any normal distribution of X into the normal distribution of Z

(X - u) / o-

(blank)

🗑

Normal Distribution Graph

much of the data is gathered around the mean. The distribution has a characteristic ‘bell shape’ symmetrical about the mean. ->The area of the bell shape = 1.

(blank)

🗑

The standard deviation

is an important measure of the spread of our data. The greater the standard deviation, the greater our spread of data.

(blank)

🗑

this Greek letter just describes the area under the bell from that point!

(blank)

🗑

line of best fit’

Any line of best fit must go through the mean of x, and the mean of y.

(blank)

🗑

linear correlation

If all (or nearly all) of these points seem to lie in a straight line

(blank)

🗑

Equation of regression line

(blank)

🗑

Regression Line x on y->Formula for b:

Sxy / Syy

(blank)

🗑

Regression Line y on x->Formula for b:

Sxy / Sxx

(blank)

🗑

Independent/dependent variables

With the above data, x looks to be controlled, where y appears to be dependent on an experiment and x. In this case, we say that x is an independent variable and y a dependent variable. As x appears controlled and accurate we only need to calculate the re

(blank)

🗑

product moment correlation coefficient

r -> is a measure of the degree of scatter.->will lie between -1 and 1.

(blank)

🗑

"r"

The product moment correlation coefficient, r, is a measure of the degree of scatter.->will lie between -1 and 1.

(blank)

🗑

Calculate E(X)

€x times P(X = x) / or € f(x)P(X = x)

(blank)

🗑

Review the information in the table. When you are ready to quiz yourself you can hide individual columns or the entire table. Then you can click on the empty cells to reveal the answer. Try to recall what will be displayed before clicking the empty cell.

To hide a column, click on the column name.

To hide the entire table, click on the "Hide All" button.

You may also shuffle the rows of the table by clicking on the "Shuffle" button.

Or sort by any of the columns using the down arrow next to any column heading.
If you know all the data on any row, you can temporarily remove it by tapping the trash can to the right of the row.

Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

Created by: 1sabelle