Curse of Heterogeneity: Computational Barriers in Sparse Mixture Models and Phase Retrieval

We study the fundamental tradeoffs between statistical accuracy and computational tractability in the analysis of high dimensional heterogeneous data. As examples, we study sparse Gaussian mixture model, mixture of sparse linear regressions, and sparse phase retrieval model. For these models, we exploit an oracle-based computational model to establish conjecture-free computationally feasible minimax lower bounds, which quantify the minimum signal strength required for the existence of any algorithm that is both computationally tractable and statistically accurate. Our analysis shows that there exist significant gaps between computationally feasible minimax risks and classical ones. These gaps quantify the statistical price we must pay to achieve computational tractability in the presence of data heterogeneity. Our results cover the problems of detection, estimation, support recovery, and clustering, and moreover, resolve several conjectures of Azizyan et al. (2013, 2015); Verzelen and Arias-Castro (2017); Cai et al. (2016). Interestingly, our results reveal a new but counter-intuitive phenomenon in heterogeneous data analysis that more data might lead to less computation complexity.

[1]  Daniel M. Kane,et al.  Statistical Query Lower Bounds for Robust Estimation of High-Dimensional Gaussians and Gaussian Mixtures , 2016, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[2]  Balázs Szörényi Characterizing Statistical Query Learning: Simplified Notions and Proofs , 2009, ALT.

[3]  Larry A. Wasserman,et al.  Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation , 2013, NIPS.

[4]  Benny Applebaum,et al.  On Basing Lower-Bounds for Learning on Worst-Case Assumptions , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[5]  Rocco A. Servedio Computational sample complexity and attribute-efficient learning , 1999, STOC '99.

[6]  Gregory Shakhnarovich,et al.  An investigation of computational and informational limits in Gaussian mixture clustering , 2006, ICML '06.

[7]  Ankur Moitra,et al.  Settling the Polynomial Learnability of Mixtures of Gaussians , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[8]  Xinhua Zhuang,et al.  Gaussian mixture density modeling, decomposition, and applications , 1996, IEEE Trans. Image Process..

[9]  Shahar Mendelson,et al.  Minimax rate of convergence and the performance of empirical risk minimization in phase retrieval * , 2015 .

[10]  Wei Pan,et al.  Penalized Model-Based Clustering with Application to Variable Selection , 2007, J. Mach. Learn. Res..

[11]  Hui Zou,et al.  Regularized Parameter Estimation in High-Dimensional Gaussian Mixture Models , 2011, Neural Computation.

[12]  Philippe Rigollet,et al.  Computational Lower Bounds for Sparse PCA , 2013, ArXiv.

[13]  Nathan Linial,et al.  More data speeds up training time in learning halfspaces over sparse vectors , 2013, NIPS.

[14]  K. Pearson Contributions to the Mathematical Theory of Evolution , 1894 .

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  Adrian E. Raftery,et al.  Model-based clustering and data transformations for gene expression data , 2001, Bioinform..

[17]  Constantine Caramanis,et al.  Regularized EM Algorithms: A Unified Framework and Statistical Guarantees , 2015, NIPS.

[18]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[19]  Cathy Maugis,et al.  A non asymptotic penalized criterion for Gaussian mixture model selection , 2011 .

[20]  Geoffrey J. McLachlan,et al.  Finite Mixture Models , 2019, Annual Review of Statistics and Its Application.

[21]  Yihong Wu,et al.  Computational Barriers in Minimax Submatrix Detection , 2013, ArXiv.

[22]  Xiaodong Li,et al.  Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow , 2015, ArXiv.

[23]  Constantine Caramanis,et al.  More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning , 2016, NIPS.

[24]  Vitaly Feldman,et al.  A Complete Characterization of Statistical Query Learning with Applications to Evolvability , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[25]  J. B. Ramsey,et al.  Estimating Mixtures of Normal Distributions and Switching Regressions , 1978 .

[26]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[27]  Santosh S. Vempala,et al.  University of Birmingham On the Complexity of Random Satisfiability Problems with Planted Solutions , 2018 .

[28]  Bruce E. Hajek,et al.  Computational Lower Bounds for Community Detection on Random Graphs , 2014, COLT.

[29]  Kunle Olukotun,et al.  Map-Reduce for Machine Learning on Multicore , 2006, NIPS.

[30]  Mikhail Belkin,et al.  Polynomial Learning of Distribution Families , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[31]  Zhaoran Wang,et al.  High Dimensional Expectation-Maximization Algorithm: Statistical Optimization and Asymptotic Normality , 2014, 1412.8729.

[32]  Jiashun Jin,et al.  Rare and Weak effects in Large-Scale Inference: methods and phase diagrams , 2014, 1410.4578.

[33]  G. Celeux,et al.  Variable Selection for Clustering with Gaussian Mixture Models , 2009, Biometrics.

[34]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[35]  Constantine Caramanis,et al.  A Convex Formulation for Mixed Regression: Near Optimal Rates in the Face of Noise , 2013, ArXiv.

[36]  W. DeSarbo,et al.  A mixture likelihood approach for generalized linear models , 1995 .

[37]  Yudong Chen,et al.  Statistical-Computational Tradeoffs in Planted Problems and Submatrix Localization with a Growing Number of Clusters and Submatrices , 2014, J. Mach. Learn. Res..

[38]  Sanjoy Dasgupta,et al.  A Two-Round Variant of EM for Gaussian Mixtures , 2000, UAI.

[39]  Martin J. Wainwright,et al.  Lower bounds on the performance of polynomial-time algorithms for sparse linear regression , 2014, COLT.

[40]  Boaz Barak Truth vs. Proof in Computational Complexity , 2012, Bull. EATCS.

[41]  Adam Tauman Kalai,et al.  Efficiently learning mixtures of two Gaussians , 2010, STOC '10.

[42]  R. D. Veaux,et al.  Mixtures of linear regressions , 1989 .

[43]  Quentin Berthet,et al.  Statistical and computational trade-offs in estimation of sparse principal components , 2014, 1408.5369.

[44]  Sham M. Kakade,et al.  Learning mixtures of spherical gaussians: moment methods and spectral decompositions , 2012, ITCS '13.

[45]  L. L. Cam,et al.  Asymptotic Methods In Statistical Decision Theory , 1986 .

[46]  Uriel Feige,et al.  Relations between average case complexity and approximation complexity , 2002, STOC '02.

[47]  Hongtu Zhu,et al.  Hypothesis testing in mixture regression models , 2004 .

[48]  A. Appendix Alternating Minimization for Mixed Linear Regression , 2014 .

[49]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[50]  Anru R. Zhang,et al.  Tensor SVD: Statistical and Computational Limits , 2017, IEEE Transactions on Information Theory.

[51]  B. Nadler,et al.  DO SEMIDEFINITE RELAXATIONS SOLVE SPARSE PCA UP TO THE INFORMATION LIMIT , 2013, 1306.3690.

[52]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[53]  Ankur Moitra,et al.  Optimality and Sub-optimality of PCA for Spiked Random Matrices and Synchronization , 2016, ArXiv.

[54]  Bertrand Michel,et al.  Sparse Bayesian Unsupervised Learning , 2014 .

[55]  Wasim Huleihel,et al.  Reducibility and Computational Lower Bounds for Problems with Planted Sparse Structure , 2018, COLT.

[56]  Ernst Wit,et al.  High dimensional Sparse Gaussian Graphical Mixture Model , 2013, ArXiv.

[57]  Sanjoy Dasgupta,et al.  Learning mixtures of Gaussians , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[58]  P. Deb Finite Mixture Models , 2008 .

[59]  Qingqing Huang,et al.  Learning Mixtures of Gaussians in High Dimensions , 2015, STOC.

[60]  Ery Arias-Castro,et al.  Detection and Feature Selection in Sparse Mixture Models , 2014, 1405.1478.

[61]  S. Mendelson,et al.  Minimax rate of convergence and the performance of empirical risk minimization in phase recovery , 2015 .

[62]  Bertrand Michel,et al.  Slope heuristics for variable selection and clustering via Gaussian mixtures , 2008 .

[63]  Frühwirth-SchnatterSylvia,et al.  Model-based clustering based on sparse finite Gaussian mixtures , 2016 .

[64]  Santosh S. Vempala,et al.  Isotropic PCA and Affine-Invariant Clustering , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[65]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[66]  Anima Anandkumar,et al.  Tensor decompositions for learning latent variable models , 2012, J. Mach. Learn. Res..

[67]  Hujun Bao,et al.  Laplacian Regularized Gaussian Mixture Model for Data Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[68]  Akshay Krishnamurthy,et al.  High-Dimensional Clustering with Sparse Gaussian Mixture Models , 2010 .

[69]  Santosh S. Vempala,et al.  Statistical Query Algorithms for Mean Vector Estimation and Stochastic Convex Optimization , 2015, SODA.

[70]  Jia Li,et al.  Variable Selection for Clustering by Separability Based on Ridgelines , 2012 .

[71]  Keinosuke Fukunaga,et al.  Estimation of the Parameters of a Gaussian Mixture Using the Method of Moments , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[72]  Mikhail Belkin,et al.  Learning Gaussian Mixtures with Arbitrary Separation , 2009, ArXiv.

[73]  S. Geer,et al.  ℓ1-penalization for mixture regression models , 2010, 1202.6046.

[74]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[75]  Santosh S. Vempala,et al.  A spectral algorithm for learning mixture models , 2004, J. Comput. Syst. Sci..

[76]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[77]  Marc Lelarge,et al.  Fundamental limits of symmetric low-rank matrix estimation , 2016, Probability Theory and Related Fields.

[78]  Zhaoran Wang,et al.  Sharp Computational-Statistical Phase Transitions via Oracle Computational Model , 2015 .

[79]  Larry A. Wasserman,et al.  Efficient Sparse Clustering of High-Dimensional Non-spherical Gaussian Mixtures , 2014, AISTATS.

[80]  Cynthia Dwork,et al.  Practical privacy: the SuLQ framework , 2005, PODS.

[81]  Gilda Soromenho,et al.  Fitting mixtures of linear regressions , 2010 .

[82]  N. Alon,et al.  Finding a large hidden clique in a random graph , 1998 .

[83]  Jiaming Xu,et al.  Statistical Problems with Planted Structures: Information-Theoretical and Computational Limits , 2018, Information-Theoretic Methods in Data Science.

[84]  Anru R. Zhang,et al.  Rate-Optimal Perturbation Bounds for Singular Subspaces with Applications to High-Dimensional Statistics , 2016, 1605.00353.

[85]  Percy Liang,et al.  Spectral Experts for Estimating Mixtures of Linear Regressions , 2013, ICML.

[86]  A. F. Smith,et al.  Statistical analysis of finite mixture distributions , 1986 .

[87]  Varun Kanade,et al.  Computational Bounds on Statistical Query Learning , 2012, COLT.

[88]  N. Balakrishnan,et al.  Continuous Bivariate Distributions , 2009 .

[89]  Andrea Montanari,et al.  Sparse PCA via Covariance Thresholding , 2013, J. Mach. Learn. Res..

[90]  Mikhail Belkin,et al.  The More, the Merrier: the Blessing of Dimensionality for Learning Large Gaussian Mixtures , 2013, COLT.

[91]  B. Lindsay,et al.  Multivariate Normal Mixtures: A Fast Consistent Method of Moments , 1993 .

[92]  A. Raftery,et al.  Variable Selection for Model-Based Clustering , 2006 .

[93]  Jiahua Chen,et al.  Variable Selection in Finite Mixture of Regression Models , 2007 .

[94]  Harrison H. Zhou,et al.  Sparse CCA: Adaptive Estimation and Computational Barriers , 2014, 1409.8565.

[95]  Ke Yang,et al.  New Lower Bounds for Statistical Query Learning , 2002, COLT.

[96]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[97]  Martin J. Wainwright,et al.  Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[98]  Ke Yang On Learning Correlated Boolean Functions Using Statistical Queries , 2001, ALT.

[99]  Yudong Chen,et al.  Incoherence-Optimal Matrix Completion , 2013, IEEE Transactions on Information Theory.

[100]  Sanjeev Arora,et al.  Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[101]  Jiashun Jin,et al.  Phase Transitions for High Dimensional Clustering and Related Problems , 2015, 1502.06952.

[102]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[103]  Tengyuan Liang,et al.  Computational and Statistical Boundaries for Submatrix Localization in a Large Noisy Matrix , 2015, 1502.01988.

[104]  Jeffrey C. Jackson On the Efficiency of Noise-Tolerant PAC Algorithms Derived from Statistical Queries , 2004, Annals of Mathematics and Artificial Intelligence.