Multiple Regresstion - P264B

Quiz yourself by thinking what should be in each of the black spaces below before clicking on it to display the answer.

Help!

Question

Answer

Regression partitions y into what two components?

SST = SSE + SSR SS Total (total variation in y) = SS Regression + SS Error

🗑

What is the interpretation of the slope (b1) coefficient?

For a 1-unit increase in x, y increases by (b) units.

🗑

Unstandardized regression coefficient

Slope (b)

🗑

What are standardized regression coefficients useful?

It is useful to standardize the regression coefficients for multiple variables (change b to B).

🗑

Standardized regression coefficient

Beta

🗑

Coefficient of determination

R2 = SSR/SST gives the proportion of variance in y accounted for by x. Only pertains to linear association

🗑

Why is the sum of the squared residuals, or errors, a minimum in OLS regression?

Using a least squares procedure guarantees that b0 and b1 will produce estimates of y .

🗑

What is the standard error of estimate (SEE)?

SEE is standard deviation of the regression; Average distance of any point to the regression line Also called root mean square error (b/c is

🗑

How is SEE calculated?

square root of MSE) SEE = S y/x = SSE/n-p When correlation between x & y=1, SEE=0 (all points are on the line).

🗑

How does SEE differ from the standard error of the slope (regression coefficient)?

SE of slope is beta (regression coefficient) SE(b) measures slope SEE measures scatter about the regression line

🗑

What is the impact of non-constant error variance on the MSE?

Non-constant error variance should increase the MSE MSE = SSE/df

🗑

What is one impact of an inflated (large) MSE?

Reduces predictive power Reduces coefficient of determination (R2)

🗑

What departures from OLS regression can be studied using residual plots?

1. Non-linear regression function 2. Non-constant variance (heteroscedascticity) 3. Correlated error terms (not independent) 4. Distribution of error terms not normal 5. Omitted an important IV from the model 6. Outliers

🗑

Departures from OLS regression ACRONYM

Homoscedasticity Independence Linearity Outliers Omitted variable Normal distrubtion

🗑

How is a residual (error term) calculated?

observed value – expected value based on regression equation; y – y hat

🗑

How is a standardized residual calculated?

Z = ei / SEE (SEE = square root MSE, MSE = SSE/df)

🗑

What are some of the drawbacks of using ZRESID to identify “unusual” cases?

Z doesn’t account for unusual cases of x, they “mask” their effects by increasing SSE (&MSE)

🗑

What three dimensions are used to characterize atypical or unusual observations?

Leverage, Discrepancy, and Influence

🗑

Leverage

represents how unusually the case is in terms of its x value (extreme in predictor set)

🗑

Discrepancy

how unusual a value of y is for a given value of x (conditioned on x)

🗑

Influence

how much of an impact each individual observation has on the global regression analysis (DFFITS) and on estimates of the regression coefficients (DFBETA)

🗑

Conceptually, what is an externally studentized residual (SDRESID)?

It calculates the residual for a point based on that point not being included in the MSE to determine how “unusual” this case is compared to the rest of the data set Also known as jackknifed residual, studentized deleted residual

🗑

Durbin-Watson test

used to check for serial correlations; Plot residuals against time; there must be no relationship among the residuals for time D=2 means no serial correlation(Range is 0 to 4)

🗑

Is Durbin-Watson useful for all types of designs having non-constant error variance?

No, only those where collection is spread over time or if time of collection is a factor

🗑

A plot of ZRESID against the IV is useful for studying which types of departures?

Discrepancy

🗑

What can a plot of the residuals against a variable not included in the regression equation tell us?

If we omitted a key variable (model specification) It runs as a covarite and tells us what part of the error is associated with that variable. Including ut would therefore reduce MSE.

🗑

How might one diagnose problems with non-constant error variance?

Use residual plots

🗑

Will the value of the standardized residual be large for all types of outlying observations?

No, the standardized residual would NOT be large for leveraged outliers.

🗑

Name a residual diagnostic that can be used to detect outlying x values.

Leverage (Hii)

🗑

What is discrepancy and how is it measured?

Discrepancy is how far y is from predicted value of y for a given value of x It is measured by comparing ZRESID and Studentized deleted residual

🗑

What are the two components of influence and what residual diagnostics are used to reflect those two components?

Influence is how much the point moves the line. DFFIT measures influence on y (whole regression equation) DFBETA (x) measures influence on the slope (DFBETA for constant is less important)

🗑

DFBETA (x)

measures influence on the slope (DFBETA for constant is less important); global

🗑

DFFIT

measures influence on y (whole regression equation); specific

🗑

What measure tells us how much the group of independent variables together estimate y?

Multiple R2

🗑

What are the limitations of R2 when used to compare between different studies?

It does not separate variables to determine the individual contribution of each variable, controlling for the others in the model

🗑

What measure tells us about the contribution of a single IV to estimating y when other variables are included in the regression equation?

Semi-partial correlation (must square to explain variance)

🗑

How are these descriptive measures interpreted?

Controlling for other variables, x1 accounts for n% of the variation in y.

🗑

Why are the regression coefficients in a multiple regression equation called “partial”?

Because they account for “part” of the variation accounted for by the full model

🗑

Interpret a regression coefficient in a multiple regression model?

Controlling for other variables, for every one-unit increase in x1, there is a n-unit increase in y.

🗑

What hypotheses are tested in the ANOVA summary table of a multiple regression model?

H0: R2y.123…p = 0 H1: R2y.123…p > 0 H0: B1= B2= B3= . . . Bp= 0 H1: not all betas are equal

🗑

If the F-statistic is significant, will all of the individual regression coefficients be significant?

Not necessarily, depends on the beta for each variable

🗑

What test determines significance of individual regression coefficients?

T-test

🗑

What are effects of collinearity on regression?

1. Affects estimates of partial regression coefficients 2. Affects size of SE(b) 3. Makes interpretations more complex b/c estimate of effect depends upon variables included 4. When extreme, there is no unique solution to the regression problem.

🗑

What is the term for extreme cases of inter-correlation among the IVs?

Multi-colinearity

🗑

What factors determine the size of the standard error of a regression coefficient?

1.Specification issues – IV omitted, model doesn’t explain enough variance 2. Restricted range of x – not providing enough variation in x to show full range of y 3. Inter-correlation – high correlation → low tolerance → small denominator → big SE

🗑

semi partial correlation

Increase in R2 when x1 is added to an equation containing x2 or the percentage of variance in Y uniquely accounted for by x1 because all other variables have been statistically controlled.

🗑

How is the semi-partial correlation interpreted?

Controlling for other variables, x1 accounts for n% of variation in y.

🗑

partial correlation

Correlation between y and x1 when linear effects of other variables are removed from x1 and y.

🗑

How is the partial correlation interpreted?

When the variation of other variables is removed, x1 accounts for n% of variation in y.

🗑

Review the information in the table. When you are ready to quiz yourself you can hide individual columns or the entire table. Then you can click on the empty cells to reveal the answer. Try to recall what will be displayed before clicking the empty cell.

To hide a column, click on the column name.

To hide the entire table, click on the "Hide All" button.

You may also shuffle the rows of the table by clicking on the "Shuffle" button.

Or sort by any of the columns using the down arrow next to any column heading.
If you know all the data on any row, you can temporarily remove it by tapping the trash can to the right of the row.

Embed Code - If you would like this activity on your web page, copy the script below and paste it into your web page.

Normal Size Small Size show me how

Created by: bkflyer

Popular Math sets

Algebra Terms

0-7 Multiplication Facts

Learn your multiplication facts

Multiplication: 0-12

Fraction/Decimal/Percent

Multiplication: 0-12

Integer Operations

Adding Facts/Subtracting Facts

Vocabulary

Geometric Concepts: Classifying Figures and Understanding Volume

SOL 6.9, 6.10, 6.11, 6.12

Multiplication Facts up to 12X12