Thursday, February 27, 2014

On Assessing Convergence to a Steady State

I recently listened to a stimulating statistics talk, "Discerning a Steady State Sequentially," by Moshe Pollak (with Tom Hope), presently visiting Penn. Of course it's impossible to know with certainty whether we're in steady state based on a finite sample path, but the point is that we may nevertheless be able to make probabilistic statements, effectively "testing the hypothesis" that we're in steady state.

Moshe takes a sequential analytic approach. Here's his abstract: "In many contexts one observes a stochastic process with the goal of learning steady-state characteristics. This talk addresses the question of how to declare with confidence that steady-state has been reached. We focus on a sequence of independent observations that tends in a stochastically monotone fashion to a constant distribution."

Moshe's obvious limitation is independence, as steady states of simulated Markov chains, not independent sequences, are the object of interest in many important applications (posterior simulation, global optimization, etc.).

In the Markov chain case, why not do something like the following. Whenever time \(t\) is a multiple of \(m\), use a distribution-free non-parametric (randomization) test for equality of distributions to test whether the unknown distribution \(f_1\) of \(x_t, ..., x_{t-(m/2)}\) equals the unknown distribution \(f_2\) of \(x_{t-(m/2)+1}, ..., x_{t-m}\). If, for example, we pick \(m=20,000\), then whenever time \(t\) is a multiple of 20,000 we would test equality of the distributions of \(x_t, ..., x_{t-10000}\) and \(x_{t-10001}, ..., x_{t-20000}\). We declare arrival at the steady state when the null is not rejected. Or something like that.

Of course the Markov chain is serially correlated, but who cares, as we're only trying to assess equality of unconditional distributions. That is, randomizations of \(x_t, ..., x_{t-(m/2)}\) and of \(x_{t-(m/2)+1}, ..., x_{t-m}\) destroy the serial correlation, but so what?

My suggestion is either misguided for some reason that I'm missing, or someone must have done it. (It's just too obvious.)

Monday, February 24, 2014

More on Factor-Augmented VAR's (Principal Components Regression)

Here's a sampling of emails that I received on my recent "Factor-Augmented VAR" post.

Serena Ng at Columbia notes that her "Targeted Predictors" paper (with Jushan Bai) is motivated by considerations similar to those that motivate partial least squares (PLS). She also notes that she has a discussion of this in a Handbook of Forecasting overview paper (sections 4 and 5), and that it's not clear that PLS is systematically dominant. I look forward to reading the Handbook piece, which, embarrassingly, I have not yet done.

George Kapetanios at Queen Mary, University of London, echoes Serena's view. He sent his new paper, "Revisiting Useful Approaches to Data-Rich Macroeconomic Forecasting" (with Jan Groen), in the PLS-Kelly-Pruitt tradition but considering more general settings (e.g., weak factors) and considering alternative methods such as ridge regression. His upshot is that forecasting is of course complex and "best" procedures depend on a variety of settings and choices (and the "best" may not be PLS), but that in any event principal-component regression (PCR) appears robustly sub-optimal.

Not least, Frank DiTraglia at Penn sent some interesting links to the chemometrics literature, which prominently features PLS and has some interesting probabilistic perspectives on it.

So it's interesting. PCR appears robustly sub-optimal, but there's a feeling that PLS at least as traditionally implemented does not appear robustly dominant. Other emails, not reported in detail here, echoed that theme,

So much for the PCR vs. PLS issue. What about the PCR vs. ridge regression (RR) issue? Enter Paramveer Dhillon, a Penn Computer Science (machine learning) Ph.D. student, who sent his paper, "A Risk Comparison of Ordinary Least Squares vs Ridge Regression" (with Dean Foster, Sham Kakade and Lyle Ungar). Paramveer et al. show that PCR risk is always within a factor of four of RR risk, but that the converse is not true; that is, RR can be arbitrarily worse than PCR. So from a different perspective PCR suddenly looks appealing. (And from the Blogger-Abusing-His-Position-to-Pat-Himself-on-the-Back Department: Paramveer also notes that he enjoyed my Ph.D. time-series course, which he audited last year!)

[Finally, my friends, just in case you missed the weekend post I'll repeat it: Please don't hesitate to post comments. Instead what usually happens is that people email me directly, and I can't respond, and I feel bad that I can't respond, and the sender feels bad that I didn't respond, and most importantly, people who would benefit from reading the comment (and perhaps reading comments on the comment, or themselves commenting on the comment) never get to see it. A bad equilibrium all around.]

Saturday, February 22, 2014

Monday, February 17, 2014

Thoughts on "Factor-Augmented VAR's"

Let's use the standard term, principal-components regression (PCR). It's irrelevant whether it's a "regular" regression or an autoregression, univariate or multivariate. Econometricians have always liked PCR. (I am no exception.) In this "data-rich" age it's more useful than ever, and things like Bernanke and Boivin's factor-augmented vector autoregressions have taken PCR to new heights of popularity.

But PCR has some awkward aspects, well-known in some circles (see, e.g., Hastie and Tibshirani, Elements of Statistical Learning, Chapter 3) but curiously little-known in others.

In particular:

(1) First-step PC extraction is "unsupervised" (in machine-learning jargon). Hence the x-variable linear combinations given by the PC's may differ importantly from the best x-variable linear combinations for predictive purposes. This is unfortunate because second-step PCR typically is used for prediction!

(2) PCR shrinks in rather awkward/extreme directions/amounts. PCR shrinks the excluded PC's completely to 0 (by construction), and moreover, it shrinks the included PC's equally toward 0, regardless of the relative sizes of their associated eigenvalues.

So, what to do?

(1) Wold's partial least squares (PLS) attempts to address issue (1). Recent interesting work, moreover, extends PLS in powerful ways, as with the Kelly-Pruitt three-pass regression filter and its amazing apparent success in predicting aggregate equity returns.

(2) Ridge regression (among others) addresses issue (2). It includes all PC's and shrinks them toward 0 according to the relative sizes of their associated eigenvalues.

Thursday, February 13, 2014

Congratulations to Loretta Mester, New President of The Federal Reserve Bank of Cleveland

Loretta J. Mester, Executive Vice President and Director of Research
Loretta is presently the Director of Research at the Philadelphia Fed, and she will replace Cleveland's Sandra Pianalto effective June 1. She has been a stunningly effective research director in Philadelphia; indeed her loss is a terrible blow to FRB Philadelphia and a massive gain for FRB Cleveland.  Not least I'm personally grateful for her enthusiastic support of FRB Philadelphia's ADS index and GDPplus series.  I will miss her, and I wish her every success in her new role.  
For additional information, see the Reuters article.

Monday, February 10, 2014

NBER Econonometrics "Methods Lectures" Videos

    For nearly a decade, the National Bureau of Economic Research has been holding a day of econometrics "Methods Lectures" during the Summer Institute, with the speakers and sub-topic changing each year.

    Evidently it's not widely known that the lecture videos and slides are available online -- just click on any of the links below.

    [Warning: Certain of the links reveal that audio/video recording/delivery is not the NBER's strong suit, but all the videos are there if you take a few minutes to figure things out.]

    Summer Institute 2013
    Econometric Methods for High-Dimensional Data
    Victor Chernozhukov, Massachusetts Institute of Technology, Matthew Gentzkow, University of Chicago and NBER, Christian Hansen, University of Chicago , Jesse Shapiro, University of Chicago and NBER, Matthew Taddy, University of Chicago

    Summer Institute 2012
    Econometric Methods for Demand Estimation
    Ariel Pakes, Harvard University and NBER and Aviv Nevo, Northwestern University and NBER

    Summer Institute 2011
    Computational Tools & Macroeconomic Applications
    Lawrence Christiano, Northwestern University and NBER and Jesus Fernandez-Villaverde, University of Pennsylvania and NBER

    Summer Institute 2010
    Financial Econometrics
    Sydney Ludvigson, New York University and NBER , Yacine Ait-Sahalia, Princeton University and NBER, Michael Brandt, Duke University and NBER and Andrew Lo, MIT and NBER

    Summer Institute 2009
    Using Field Experiments in Economics: An Introduction, and Conducting Field Research in Developing Countries
    John List, University of Chicago and NBER and Michael Kremer, Harvard University and NBER

    Summer Institute 2008
    Whats New in Econometrics – Time Series
    James H. Stock, Harvard University and NBER and Mark W. Watson, Princeton University and NBER

    Summer Institute 2007
    Whats New in Econometrics?
    Guido Imbens, Harvard University and NBER and Jeffrey Wooldridge, Michigan State University

Friday, February 7, 2014

Monday, February 3, 2014

Research Credibility, Bayes, and "Searching for Asterisks"

Is there really a "credibility crisis" in the sciences that use statistics, as some seem to fear these days? I think not; generally I'm on board with Demming's "In God we trust, all others bring data." Of course there are issues, but they're hardly new. Some simply reflect poor understanding of statistics. For example, a Bayesian calculation of post-study probability, \( P(H_0 ~ true ~|~ data ) \), is very different from a classical \(p\)-value, \( P(data ~|~ H_0 ~ true) \). The former can be large even when the latter is very small -- notwithstanding the fact that the two are often naively confused. Other issues are real -- like the effects of "searching for asterisks" (data mining, in the bad sense) and the corresponding "file-drawer problem" in which "insignificant" results languish in file drawers, unsubmitted, unpublished and unseen -- but lots of existing and ongoing work is helping us to confront them.

It's important, however, always to be on alert. Here's some reading on the issue of \( P(H_0 ~ true ~|~ data ) \) vs. \( P(data ~|~ H_0 ~ true) \), which has gotten fresh attention recently. Cohen (1994) is classic, as is its title, "The Earth Is Round (\(p < .05\))." Fast-forwarding twenty years, the Maniadis et al. (2014) AER piece is very interesting (also see the January 2014 Issue of Econ Journal Watch, which arrived as spam but turned out to contain an interesting comment with a rejoinder by Maniadis et al.). Last and not least, see Dick Startz's 2013 working paper.