An Imputation-Consistency Algorithm for High-Dimensional Missing Data Problems and Beyond

Missing data are frequently encountered in high-dimensional problems, but they are usually difficult to deal with using standard algorithms, such as the expectation-maximization (EM) algorithm and its variants. To tackle this difficulty, some problem-specific algorithms have been developed in the literature, but there still lacks a general algorithm. This work is to fill the gap: we propose a general algorithm for high-dimensional missing data problems. The proposed algorithm works by iterating between an imputation step and a consistency step. At the imputation step, the missing data are imputed conditional on the observed data and the current estimate of parameters; and at the consistency step, a consistent estimate is found for the minimizer of a Kullback-Leibler divergence defined on the pseudo-complete data. For high dimensional problems, the consistent estimate can be found under sparsity constraints. The consistency of the averaged estimate for the true parameter can be established under quite general conditions. The proposed algorithm is illustrated using high-dimensional Gaussian graphical models, high-dimensional variable selection, and a random coefficient model.

[1]  Qi Long,et al.  Multiple imputation in the presence of high-dimensional data , 2016, Statistical methods in medical research.

[2]  Yufeng Liu,et al.  Sparse Regression Incorporating Graphical Structure Among Predictors , 2016, Journal of the American Statistical Association.

[3]  F. Liang,et al.  A split‐and‐merge Bayesian variable selection approach for ultrahigh dimensional regression , 2015 .

[4]  F. Liang,et al.  High-Dimensional Variable Selection With Reciprocal L1-Regularization , 2015 .

[5]  Faming Liang,et al.  An Equivalent Measure of Partial Correlation Coefficients for High-Dimensional Gaussian Graphical Models , 2015 .

[6]  Qi Long,et al.  Variable selection in the presence of missing data: resampling and imputation. , 2015, Biostatistics.

[7]  R. Vershynin Estimation in High Dimensions: A Geometric Perspective , 2014, 1405.5103.

[8]  A. V. D. Vaart,et al.  BAYESIAN LINEAR REGRESSION WITH SPARSE PRIORS , 2014, 1403.0735.

[9]  Minge Xie,et al.  A Split-and-Conquer Approach for Analysis of Extraordinarily Large Data , 2014 .

[10]  S. Geer,et al.  On asymptotically optimal confidence regions and tests for high-dimensional models , 2013, 1303.0518.

[11]  Runze Li,et al.  VARIABLE SELECTION IN LINEAR MIXED EFFECTS MODELS. , 2012, Annals of statistics.

[12]  V. Johnson,et al.  Bayesian Model Selection in High-Dimensional Settings , 2012, Journal of the American Statistical Association.

[13]  Chuanhai Liu,et al.  The dynamic ‘expectation–conditional maximization either’ algorithm , 2012 .

[14]  Stef van Buuren,et al.  MICE: Multivariate Imputation by Chained Equations in R , 2011 .

[15]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[16]  Cun-Hui Zhang,et al.  Confidence intervals for low dimensional parameters in high dimensional linear models , 2011, 1110.2563.

[17]  Robert Tibshirani,et al.  Spectral Regularization Algorithms for Learning Large Incomplete Matrices , 2010, J. Mach. Learn. Res..

[18]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[19]  Martin J. Wainwright,et al.  Minimax Rates of Estimation for High-Dimensional Linear Regression Over $\ell_q$ -Balls , 2009, IEEE Transactions on Information Theory.

[20]  P. Bühlmann,et al.  Missing values and sparse inverse covariance estimation , 2009 .

[21]  Jianqing Fan,et al.  Sure independence screening in generalized linear models with NP-dimensionality , 2009, The Annals of Statistics.

[22]  F. Liang,et al.  Estimating the false discovery rate using the stochastic approximation algorithm , 2008 .

[23]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[24]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[25]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[26]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[27]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[28]  Jianqing Fan,et al.  Sure independence screening for ultrahigh dimensional feature space , 2006, math/0612857.

[29]  V. Sheffield,et al.  Regulation of gene expression in the mammalian eye and its relevance to eye disease , 2006, Proceedings of the National Academy of Sciences.

[30]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[31]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[32]  T. H. Bø,et al.  LSimpute: accurate estimation of missing values in microarray data with least squares methods. , 2004, Nucleic acids research.

[33]  Shin Ishii,et al.  A Bayesian missing value estimation method for gene expression profile data , 2003, Bioinform..

[34]  John D. Storey A direct approach to false discovery rates , 2002 .

[35]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[36]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[37]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[38]  W. A. Kirk,et al.  An Introduction to Metric Spaces and Fixed Point Theory , 2001 .

[39]  D. Botstein,et al.  Genomic expression programs in the response of yeast cells to environmental changes. , 2000, Molecular biology of the cell.

[40]  S. Nielsen The stochastic EM algorithm: estimation and asymptotic results , 2000 .

[41]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[42]  D. Rubin,et al.  The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence , 1994 .

[43]  G. Perdew,et al.  Regulation of Gene Expression , 2008, Goodman's Medical Cell Biology.

[44]  Xiao-Li Meng,et al.  Maximum likelihood estimation via the ECM algorithm: A general framework , 1993 .

[45]  D. Firth Bias reduction of maximum likelihood estimates , 1993 .

[46]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[47]  G. Celeux,et al.  A stochastic approximation type EM algorithm for the mixture problem , 1992 .

[48]  W. Newey,et al.  Uniform Convergence in Probability and Stochastic Equicontinuity , 1991 .

[49]  G. C. Wei,et al.  A Monte Carlo Implementation of the EM Algorithm and the Poor Man's Data Augmentation Algorithms , 1990 .

[50]  C. N. Morris,et al.  The calculation of posterior distributions by data augmentation , 1987 .

[51]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[52]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[53]  R. Jennrich Asymptotic Properties of Non-Linear Least Squares Estimators , 1969 .

[54]  Peter Bühlmann,et al.  Pattern alternating maximization algorithm for missing data in high-dimensional problems , 2014, J. Mach. Learn. Res..

[55]  P. Bühlmann,et al.  Missing values: sparse inverse covariance estimation and an extension to sparse regression , 2012, Stat. Comput..

[56]  Hongtu Zhu,et al.  VARIABLE SELECTION FOR REGRESSION MODELS WITH MISSING DATA. , 2010, Statistica Sinica.

[57]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[58]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[59]  P. Georgopoulos,et al.  Gaussian mixture clustering and imputation of microarray data , 2004, Bioinform..

[60]  D. Rubin,et al.  Parameter expansion to accelerate EM : The PX-EM algorithm , 1997 .

[61]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[62]  Benedikt M. Pötscher,et al.  Generic uniform convergence and equicontinuity concepts for random functions: An exploration of the basic structure , 1994 .

[63]  H. White Maximum Likelihood Estimation of Misspecified Models , 1982 .

[64]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[65]  Donald B. Rubin,et al.  Max-imum Likelihood from Incomplete Data , 1972 .