click below
click below
Normal Size Small Size show me how
econ 570 midterm 2
| Term | Definition |
|---|---|
| What does 'import statsmodels.api as sm' do? | It imports the main statsmodels library which includes models like OLS and ANOVA. |
| What does 'import statsmodels.formula.api as smf' do? | It allows you to create statistical models using formulas like 'y ~ x1 + x2'. |
| What is the syntax to specify a model using smf? | smf.ols('y ~ x1 + x2', data=df) |
| What type of model does smf.ols() create? | An ordinary least squares (OLS) linear regression model. |
| How do you estimate the coefficients of a model? | Use .fit() after defining the model, e.g., results = model.fit() |
| How do you display regression results in full detail? | Use print(results.summary()) |
| How do you view only the estimated coefficients? | results.params |
| How do you get confidence intervals for coefficients? | results.conf_int() |
| What does it mean if the confidence interval includes 0? | The coefficient is not statistically significant. |
| How do you get the R-squared value from a regression? | results.rsquared |
| What is the formula syntax for creating a regression model? | DependentVariable ~ IndependentVariable(s) |
| Example of syntax for regression with multiple controls | sleep_model = smf.ols('sleep ~ twork + educ + age', data=sleep) |
| What function allows you to estimate model parameters? | .fit() |
| Example of fitting and printing a model | results = sleep_model.fit(); print(results.summary()) |
| What does np.log() do in regression modeling? | It transforms variables to measure elasticity or proportional changes. |
| Example of log transformation in model | smf.ols('sleep ~ twork + educ + np.log(age)', data=sleep) |
| Why use np.log(age) instead of age? | Because age effects usually diminish over time and log makes the relationship more linear. |
| How do you include interaction effects between variables? | Use the * operator, e.g., smf.ols('sleep ~ totwrk + educ*age', data=sleep) |
| What does a negative educ:age coefficient indicate? | As age increases, the effect of education on sleep decreases. |
| How do you include a categorical fixed effect? | Add C(variable), e.g., smf.ols('sleep ~ totwrk + educ + age + C(gender)', data=sleep) |
| How do you detect heteroskedasticity? | Check if residuals are unevenly distributed. |
| How do you correct for heteroskedasticity? | Use fit(cov_type='HC3') |
| Example of heteroskedasticity correction | prac_results_h3 = smf.ols('cigs ~ np.log(income)', data=smoke).fit(cov_type='HC3') |
| What function is used to make scatter plots? | ax.scatter(x, y, marker='o', alpha=0.5) |
| What does sns.despine(ax=ax) do? | Removes the top and right spines from the plot for cleaner visuals. |
| What is the correct syntax for a scatter plot of log(income) vs residuals? | ax.scatter(np.log(smoke['income']), prac_results.resid, marker='o', alpha=0.5) |
| What is wrong with this syntax: ax.scatter(np.log(smoke[income]), prac_results.resid)? | The variable 'income' must be quoted: smoke['income'] not smoke[income]. |
| How do you sort data before plotting a regression line? | sleep.sort_values('twork', inplace=True) |
| How do you plot a regression line manually? | Use ax.plot(x, y, color='black') after creating x and y values. |
| Example of creating regression line points | x = [sleep['twork'].min(), sleep['twork'].max()]; y = [p.Intercept + p.twork*i for i in x] |
| What does results.get_prediction() return? | Predicted values and statistics including confidence intervals. |
| How do you extract confidence intervals from prediction results? | summary_frame = predictions.summary_frame(alpha=0.05) |
| How do you plot confidence intervals manually? | ax.fill_between(x, ci_lower, ci_upper, color='blue', alpha=0.2, edgecolor='none') |
| Why not use seaborn for regression confidence intervals? | Seaborn doesn’t show model equations or specific statistical results. |
| What import is used for instrumental variable regression? | from linearmodels.iv.model import IV2SLS |
| What is the purpose of regression discontinuity? | To measure the difference in outcomes around a cutoff or discontinuity. |
| What does Difference in Differences (DiD) estimate? | Causal effects between two groups and two periods. |
| What is the Parallel Trends Assumption? | The treated and control groups would have moved similarly over time if no treatment occurred. |
| How to tell if parallel trends hold? | Pre-treatment coefficients are close to 0 and error bars cross 0. |
| What does it mean if error bars cross 0? | The difference is not statistically significant. |
| What library is used for time series analysis? | import statsmodels.tsa.api as tsa |
| What does tsa.adfuller() test for? | Stationarity of a time series. |
| What is the null hypothesis of the Dickey-Fuller test? | That the series is non-stationary. |
| How do you interpret a p-value < 0.05 in Dickey-Fuller? | Reject the null hypothesis → the series is stationary. |
| What function estimates autoregressive models? | tsa.AutoReg() |
| Example of fitting an AR(2) model | ri_unemp = tsa.AutoReg(ri_data, lags=2).fit() |
| How do you forecast future values in a time series? | Use .forecast(steps) |
| Example of forecasting eight future periods | predictions = ri_unemp.forecast(8) |
| How do you plot observed vs forecasted data? | Plot lines with different colors for observed (blue) and forecasted (red). |
| What does Difference in Differences require? | At least two groups (treated/control) and two time periods (before/after). |
| What are event studies used for? | To explore pre- and post-treatment trends over time. |
| What function creates dummy variables? | pd.get_dummies() |
| Why drop a reference category after get_dummies()? | To avoid multicollinearity. |
| Example of dropping a reference column | sw06.drop(columns='Di_mi', inplace=True) |
| What function merges coefficient estimates and confidence intervals? | pd.concat([results.conf_int(), results.params], axis=1) |
| How do you plot event study coefficients with error bars? | ax.errorbar(x, y, yerr, color='red', capsize=2) |
| What function draws shaded confidence intervals? | ax.fill_between(x, lower_ci, upper_ci, color='blue', alpha=0.1) |
| What is wrong with this syntax: ax.fill_between(sleep['twork'], ci_lower, ci_upper, color=”CO”, alpha=0.2)? | Invalid color code; use a valid color name like 'blue' or 'gray'. |
| What function creates horizontal and vertical reference lines? | ax.axhline() and ax.axvline() |
| What is wrong with this syntax: ax.axvline(0, linestyle=’dashed’, color’=blue’)? | Missing '=' after color → should be color='blue'. |
| What does sns.despine(ax=ax) achieve? | Removes unnecessary chart borders for cleaner visuals. |
| What does .set_xlabel() and .set_ylabel() do? | Label the x-axis and y-axis respectively. |
| What is the syntax to label axes? | ax.set_xlabel('label', fontsize=14); ax.set_ylabel('label', fontsize=14) |
| What does the parameter alpha control in scatter plots? | The transparency of points. |
| What happens if you forget to quote column names in a formula? | It causes a NameError because the variable is undefined. |
| What function combines two dataframes vertically? | pd.concat([df1, df2]) |
| What does .summary_frame(alpha=0.05) return? | A DataFrame containing mean predictions, confidence intervals, and standard errors. |
| What is the purpose of cov_type='HC3'? | To correct standard errors for heteroskedasticity. |
| What function removes columns from a DataFrame? | .drop(columns='column_name', inplace=True) |
| What function returns unique values from an index? | .unique() |
| How do you create an annual frequency DateTimeIndex? | pd.DatetimeIndex(df.index.values, freq=df.index.inferred_freq, name='date') |
| What is the dependent variable in sleep_model = smf.ols('sleep ~ twork + educ + age', data=sleep)? | sleep |
| What is the correct syntax for adding interaction terms? | variable1 * variable2 |
| What does smf.ols stand for? | Ordinary Least Squares regression using formula API. |
| What is the default alpha level for 95 | |
| confidence intervals? | 0.05 |
| What function gives fitted residuals from a model? | results.resid |
| What does results.params['x1'] represent? | The estimated coefficient for variable x1. |
| How can you interpret a negative coefficient? | A one-unit increase in x decreases y by the coefficient value. |
| What causes multicollinearity? | Including two or more highly correlated variables (like both educ and educ*age). |
| What library provides seaborn plotting functions? | import seaborn as sns |
| How do you interpret the intercept in an OLS model? | Expected value of y when all x variables are 0. |
| What is the difference between sm and smf? | sm provides core statistical models; smf allows model creation using formulas. |
| What function gives fitted values from regression? | results.fittedvalues |
| What does OLS stand for? | Ordinary Least Squares. |
| What does the ‘~’ operator mean in a formula? | It separates dependent and independent variables. |