Sunday, February 19, 2017

Econometrics: Angrist and Pischke are at it Again

Check out the new Angrist-Pischke (AP), "Undergraduate Econometrics Instruction: Through Our Classes, Darkly".

I guess I have no choice but to weigh in. The issues are important, and my earlier AP post, "Mostly Harmless Econometrics?", is my all-time most popular.

Basically AP want all econometrics texts to look a lot more like theirs. But their books and their new essay unfortunately miss (read: dismiss) half of econometrics.

Here's what AP get right:

(Goal G1) One of the major goals in econometrics is predicting the effects of exogenous "treatments" or "interventions" or "policies". Phrased in the language of estimation, the question is "If I intervene and give someone a certain treatment \({\partial x}, x \in X\), what is my minimum-MSE estimate of her \(\ \partial y\)?" So we are estimating the partial derivative \({\partial y / \partial x}\).

AP argue the virtues and trumpet the successes of a "design-based" approach to G1. In my view they make many good points as regards G1: discontinuity designs, dif-in-dif designs, and other clever modern approaches for approximating random experiments indeed take us far beyond "Stones'-age" approaches to G1. 
(AP sure turn a great phrase...). And the econometric simplicity of the design-based approach is intoxicating: it's mostly just linear regression of \(y\) on \(x\) and a few cleverly-chosen control variables -- you don't need a full model -- with White-washed standard errors. Nice work if you can get it. And yes, moving forward, any good text should feature a solid chapter on those methods.

Here's what AP miss/dismiss:

(Goal G2) The other major goal in econometrics is predicting \(y\). In the language of estimation, the question is "If a new person \(i\) arrives with covariates \(X_i\), what is my minimum-MSE estimate of her \(y_i\)? So we are estimating a conditional mean \(E(y | X) \), which in general is very different from estimating a partial derivative \({\partial y / \partial x}\).

The problem with the AP paradigm is that it doesn't work for goal G2. Modeling nonlinear functional form is important, as the conditional mean function \(E(y | X) \) may be highly nonlinear in \(X\); systematic model selection is important, as it's not clear a priori what subset of \(X\) (i.e., what model) might be most important for approximating \(E(y | X) \); detecting and modeling heteroskedasticity is important (in both cross sections and time series), as it's the key to accurate interval and density prediction; detecting and modeling serial correlation is crucially important in time-series contexts, as "the past" is the key conditioning information for predicting "the future"; etc., etc, ... 

(Notice how often "model" and "modeling" appear in the above paragraph. That's precisely what AP dismiss, even in their abstract, which very precisely, and incorrectly, declares that "Applied econometrics ...[now prioritizes]... the estimation of specific causal effects and empirical policy analysis over general models of outcome determination".)

The AP approach to goal G2 is to ignore it, in a thinly-veiled attempt to equate econometrics exclusively with G1. Sorry guys, but no one's buying it. That's why the textbooks continue to feature G2 tools and techniques so prominently, as well they should.