Sunday, October 6, 2019

Large Dimensional Factor Analysis with Missing Data

Back from the very strong Stevanovich meeting.  Program and abstracts here.  One among many  highlights was:

Large  Dimensional  Factor Analysis  with  Missing Data
Presented by Serena Ng, (Columbia, Dept. of Economics)

This paper introduces two factor-based imputation procedures that will fill
missing values with consistent estimates of the common component. The
first method is applicable when the missing data are bunched. The second
method is appropriate when the data are missing in a staggered or
disorganized manner. Under the strong factor assumption, it is shown that
the low rank component can be consistently estimated but there will be at
least four convergence rates, and for some entries, re-estimation can
accelerate convergence. We provide a complete characterization of the
sampling error without requiring regularization or imposing the missing at
random assumption as in the machine learning literature. The
methodology can be used in a wide range of applications, including
estimation of covariances and counterfactuals.

This paper just blew me away.  Re-arrange the X columns to get all the "complete cases across people" (tall block) in the leftmost columns, and re-arrange the X rows to get all the "complete cases across variables" (wide block) in the topmost rows.  The intersection is the "balanced" block in the upper left.  Then iterate on the tall and wide blocks to impute the missing data in the bottom right "missing data" block. The key figure that illustrates the procedure provided a real "eureka moment" for me.  Plus they have a full asymptotic theory as opposed to just worst-case bounds.


I'm not sure whether the paper is circulating yet, and Serena's web site vanished recently (not her fault -- evidently Google made a massive error), but you'll soon be able to get the paper one way or another.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.