data analysis
Quiz yourself by thinking what should be in
each of the black spaces below before clicking
on it to display the answer.
Help!
|
|
||||
---|---|---|---|---|---|
useful graphs | scatterplot
can get a sense for the nature of the relationship
🗑
|
||||
what to look for in a graph | relationship between two variables where one variable causes changes to another
🗑
|
||||
location | where most of the data lies
🗑
|
||||
spread | variability of the data, how far apart or close together it is
🗑
|
||||
shape | symetric, skewed etc
🗑
|
||||
nature of relationship | existent/ non-existent
strong/ weak
increasing/ decreasing
linear/ non-linear
🗑
|
||||
outliers in scatterplots | represent some unexplainable anomalies in data
could reveal possible systematic structure worthy of investigation
🗑
|
||||
casual relationship | relationship between two variables where one variable causes changes to another
🗑
|
||||
explanatory variable | explains or causes the change
on x-axis
🗑
|
||||
response variable | is changed
on y-axis
🗑
|
||||
useful numbers | correlation and regression
🗑
|
||||
formula for the correlation coefficient | r= 1/(n-1) ∑▒〖((xi-x ̅)/sx〗)((yi-y ̅)/sy)
🗑
|
||||
xi or yi | axis values of corresponding letter
🗑
|
||||
xbar or ybar | mean of axis values of corresponding letter
🗑
|
||||
sx or sy | standard deviation of axis values of corresponding latter
🗑
|
||||
properties of r | close to 1 = strong positive linear relatoinship
close to -1 = strong negative linear relationship
close to 0 = weak or non-existent linear relationsip
🗑
|
||||
cautions about the use of r | only useful for describing linear relationships
sensitive to outliers
🗑
|
||||
regression models | general linear relationships between variables
focus negative = decrease
🗑
|
||||
what regression modelling does | describes behaviour of response variable (the variable of interest) in terms of a collection predictors (related variables ie. explanatory variable(s))
🗑
|
||||
a linear framework is used to look at? | the relationship between the response and the regressors
formula: Y = α + βx
Where α is the intercept and β is the slope
🗑
|
||||
ideal model for linear framework in terms of responses and regressors | one unique response to one given regressor
🗑
|
||||
real world model for linear framework in terms of responses and regressors | must approximate
🗑
|
||||
statistical model | relates response to physical model predictions
allows for better predictions and quantification of uncertainty concerning the response
to make decisions
🗑
|
||||
what does regression analysis do? | finds the best relationship between responses and regressors for a particular class of models
🗑
|
||||
experimenter controls predictors, why? | may be important for making inferences about the effect of predictors on response
🗑
|
||||
course assumption | predictors are controlled in an experiment or at least accurately measured
🗑
|
||||
define a good statistical model | fit, predictive performance, parsimony interpretability
🗑
|
||||
qualitative description of model | response = signal + noise
Y = α + βx + ǫ
ǫ = noise
🗑
|
||||
define signal | a small number of unknown parameters
variation in response explained in terms of predictors
it is the systematic part of the model
🗑
|
||||
define noise | residual variation unexplained in the systematic part of the model
can be described in terms of unknown parameters
🗑
|
||||
what does a good statistical model do to possibly large and complex data | reduces it to a small number of parameters
🗑
|
||||
a model will fit well if | the systematic part of the model describes much of the variation in the response (low noise)
large number of parameters may be required to do this
🗑
|
||||
define parsimony: | smaller number of parameters = grater reduction of data, more useful for making a decision
🗑
|
||||
there is a cycle between what? | tentative model formulation, estimation of parameters and model criticism
🗑
|
||||
a good model will | manage balance between goodness of fit and complexity
provide reduction useful data
🗑
|
||||
model response variable in terms of a single predictor | yn = values of the response variable
🗑
|
Review the information in the table. When you are ready to quiz yourself you can hide individual columns or the entire table. Then you can click on the empty cells to reveal the answer. Try to recall what will be displayed before clicking the empty cell.
To hide a column, click on the column name.
To hide the entire table, click on the "Hide All" button.
You may also shuffle the rows of the table by clicking on the "Shuffle" button.
Or sort by any of the columns using the down arrow next to any column heading.
If you know all the data on any row, you can temporarily remove it by tapping the trash can to the right of the row.
To hide a column, click on the column name.
To hide the entire table, click on the "Hide All" button.
You may also shuffle the rows of the table by clicking on the "Shuffle" button.
Or sort by any of the columns using the down arrow next to any column heading.
If you know all the data on any row, you can temporarily remove it by tapping the trash can to the right of the row.
Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.
Normal Size Small Size show me how
Normal Size Small Size show me how
Created by:
Nymphette
Popular Math sets