Bootstrap-Based Regularization for Low-Rank Matrix Estimation

We develop a flexible framework for low-rank matrix estimation that allows us to transform noise models into regularization schemes via a simple bootstrap algorithm. Effectively, our procedure seeks an autoencoding basis for the observed matrix that is stable with respect to the specified noise model; we call the resulting procedure a stable autoencoder. In the simplest case, with an isotropic noise model, our method is equivalent to a classical singular value shrinkage estimator. For non-isotropic noise models, e.g., Poisson noise, the method does not reduce to singular value shrinkage, and instead yields new estimators that perform well in experiments. Moreover, by iterating our stable autoencoding scheme, we can automatically generate low-rank estimates without specifying the target rank as a tuning parameter.

[1]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[2]  Xin Liu,et al.  Document clustering based on non-negative matrix factorization , 2003, SIGIR.

[3]  Yoshio Takane Constrained Principal Component Analysis and Related Techniques , 2013 .

[4]  A. Owen,et al.  Bootstrapping data arrays of arbitrary order , 2011, 1106.2125.

[5]  D. Reich,et al.  Principal components analysis corrects for stratification in genome-wide association studies , 2006, Nature Genetics.

[6]  H. Hirschfeld A Connection between Correlation and Contingency , 1935, Mathematical Proceedings of the Cambridge Philosophical Society.

[7]  David L. Donoho,et al.  The Optimal Hard Threshold for Singular Values is 4/sqrt(3) , 2013, 1305.5870.

[8]  M. Greenacre Correspondence Analysis in Practice, Second Edition , 2007 .

[9]  Julie Josse,et al.  Adaptive shrinkage of singular values , 2013, Statistics and Computing.

[10]  Julie Josse,et al.  Regularised PCA to denoise and visualise data , 2013, Stat. Comput..

[11]  S. Holmes,et al.  Measures of dependence between random vectors and tests of independence. Literature review , 2013, 1307.7383.

[12]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  I. Jolliffe Principal Component Analysis , 2002 .

[15]  David L. Donoho,et al.  Optimal Shrinkage of Singular Values , 2014, IEEE Transactions on Information Theory.

[16]  Christian P. Robert,et al.  Large-scale inference , 2010 .

[17]  R. Durrett Probability: Theory and Examples , 1993 .

[18]  David M. Blei,et al.  Population Empirical Bayes , 2014, UAI.

[19]  L. A. Goodman The Analysis of Cross-Classified Data Having Ordered and/or Unordered Categories: Association Models, Correlation Models, and Asymmetry Models for Contingency Tables With or Without Missing Entries , 1985 .

[20]  Christopher D. Manning,et al.  Feature Noising for Log-Linear Structured Prediction , 2013, EMNLP.

[21]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[22]  Y. Escoufier LE TRAITEMENT DES VARIABLES VECTORIELLES , 1973 .

[23]  Sida I. Wang,et al.  Dropout Training as Adaptive Regularization , 2013, NIPS.

[24]  Andrew B. Nobel,et al.  Reconstruction of a low-rank matrix in the presence of Gaussian noise , 2010, J. Multivar. Anal..

[25]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[26]  Yann LeCun,et al.  Transformation invariance in pattern recognition: Tangent distance and propagation , 2000, Int. J. Imaging Syst. Technol..

[27]  Christopher M. Bishop,et al.  Current address: Microsoft Research, , 2022 .

[28]  R. Clarke,et al.  Theory and Applications of Correspondence Analysis , 1985 .

[29]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[30]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[31]  J. W. Silverstein,et al.  Eigenvalues of large sample covariance matrices of spiked population models , 2004, math/0408165.

[32]  Emmanuel J. Candès,et al.  A Singular Value Thresholding Algorithm for Matrix Completion , 2008, SIAM J. Optim..

[33]  Sida I. Wang,et al.  Altitude Training: Strong Bounds for Single-Layer Dropout , 2014, NIPS.

[34]  Stephen Tyree,et al.  Learning with Marginalized Corrupted Features , 2013, ICML.

[35]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[36]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[37]  Kurt Hornik,et al.  Neural networks and principal component analysis: Learning from examples without local minima , 1989, Neural Networks.

[38]  M. Lustig,et al.  Compressed Sensing MRI , 2008, IEEE Signal Processing Magazine.

[39]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[40]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[41]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.

[42]  Yehuda Koren,et al.  Matrix Factorization Techniques for Recommender Systems , 2009, Computer.

[43]  Wray L. Buntine Variational Extensions to EM and Multinomial PCA , 2002, ECML.

[44]  Bernard Victorri,et al.  Transformation invariance in pattern recognition: Tangent distance and propagation , 2000 .

[45]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[46]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[47]  John D. Storey,et al.  Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis , 2007, PLoS genetics.

[48]  Dacheng Tao,et al.  Simple Exponential Family PCA , 2010, IEEE Transactions on Neural Networks and Learning Systems.

[49]  H. Bourlard,et al.  Auto-association by multilayer perceptrons and singular value decomposition , 1988, Biological Cybernetics.

[50]  M. Greenacre Correspondence analysis in practice , 1993 .

[51]  Sanjoy Dasgupta,et al.  A Generalization of Principal Components Analysis to the Exponential Family , 2001, NIPS.

[52]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[53]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[54]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[55]  Cun-Hui Zhang,et al.  The sparsity and bias of the Lasso selection in high-dimensional linear regression , 2008, 0808.0967.

[56]  Alexandre d'Aspremont,et al.  Approximation bounds for sparse principal component analysis , 2012, Math. Program..

[57]  S. Chatterjee,et al.  Matrix estimation by Universal Singular Value Thresholding , 2012, 1212.1247.

[58]  Julie Josse,et al.  denoiseR: A Package for Low Rank Matrix Estimation , 2016, 1602.01206.

[59]  Julie Josse,et al.  Selecting the number of components in principal component analysis using cross-validation approximations , 2012, Comput. Stat. Data Anal..

[60]  Emmanuel J. Candès,et al.  Unbiased Risk Estimates for Singular Value Thresholding and Spectral Estimators , 2012, IEEE Transactions on Signal Processing.

[61]  A. Buja,et al.  OBSERVATIONS ON BAGGING , 2006 .

[62]  Percy Liang,et al.  Data Augmentation via Levy Processes , 2016, 1603.06340.

[63]  D. Paul ASYMPTOTICS OF SAMPLE EIGENSTRUCTURE FOR A LARGE DIMENSIONAL SPIKED COVARIANCE MODEL , 2007 .

[64]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[65]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[66]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[67]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[68]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[69]  H. Robbins The Empirical Bayes Approach to Statistical Decision Problems , 1964 .

[70]  Pierre Baldi,et al.  The dropout learning algorithm , 2014, Artif. Intell..

[71]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[72]  Jan de Leeuw,et al.  Principal component analysis of binary data by iterated singular value decomposition , 2006, Comput. Stat. Data Anal..