The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications

We consider estimation of and inference about coefficients on a small number of variables of interest in a linear panel data model with additive unobserved individual and time specific effects and a large number of additional time-varying confounding variables.  We allow the number of these additional confounding variables, $p$,  to be larger than the sample size, $n$, and suppose that, in addition to unrestricted time and individual specific effects, these confounding variables are generated by a small number, $k \ll n$, of common factors and $p$ weakly-dependent disturbances.  We allow that both the factors and the disturbances are related to both the outcome variable and other variables of interest.  Because the number confounding variables is larger than the sample size, we impose that the contribution of the part of the confounding variables not captured by time specific effects, individual specific effects, or the common factors can be captured by a relatively small number of terms whose identities are unknown.  The structure generalizes usual large factor models, sparsity based models, and factor augmented regression models with a small number of ex ante selected regressors.  Within this framework, we provide a simple computational algorithm based on factor extraction followed by lasso regression for estimating and doing inference about parameters of interest and show that the resulting procedure has good asymptotic properties.  As an input into these results, we provide concentration and selection results for a setting in which one is interested in linearly predicting an outcome variable with a large number of potential explanatory variables that may have a factor structure but where the factors do not capture all the explanatory power in the available predictors.  We also provide a simple k-step bootstrap procedure that may be used to construct inferential statements about parameters of interest and prove its asymptotic validity.  The proposed bootstrap may be of substantive independent interest outside of the present context as the proposed bootstrap may readily be adapted to other contexts involving inference after lasso variable selection and the proof of its validity requires some new technical arguments.  We also provide simulation evidence about performance of our procedure and illustrate its use in empirical applications.

The University of Chicago Booth School of Business
Wednesday, September 7, 2016 - 13:00
Sala de Asamblea, Beauchef 851, floor 4