Computational and statistical tradeoffs via convex relaxation

Significance The growth in the size and scope of datasets in science and technology has created a need for foundational perspectives on data analysis that blend computer science and statistics. Specifically, the core challenge with massive datasets is that of guaranteeing improved accuracy of an analysis procedure as data accrue, even in the face of a time budget. We address this problem via a notion of “algorithmic weakening,” whereby as data scale, the procedure backs off to cheaper algorithms, leveraging the growing inferential strength of the data to ensure that a desired level of accuracy is achieved within the computational budget. Modern massive datasets create a fundamental problem at the intersection of the computational and statistical sciences: how to provide guarantees on the quality of statistical inference given bounds on computational resources, such as time or space. Our approach to this problem is to define a notion of “algorithmic weakening,” in which a hierarchy of algorithms is ordered by both computational efficiency and statistical efficiency, allowing the growing strength of the data at scale to be traded off against the need for sophisticated processing. We illustrate this approach in the setting of denoising problems, using convex relaxation as the core inferential tool. Hierarchies of convex relaxations have been widely used in theoretical computer science to yield tractable approximation algorithms to many computationally intractable tasks. In the current paper, we show how to endow such hierarchies with a statistical characterization and thereby obtain concrete tradeoffs relating algorithmic runtime to amount of data.

[1]  Jack Edmonds,et al.  Maximum matching and a polyhedron with 0,1-vertices , 1965 .

[2]  R. Dudley The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , 1967 .

[3]  A. Balaban Chemical applications of graph theory , 1976 .

[4]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[5]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[6]  M. R. Rao,et al.  Odd Minimum Cut-Sets and b-Matchings , 1982, Math. Oper. Res..

[7]  Aharon Ben-Tal,et al.  Lectures on modern convex optimization , 1987 .

[8]  Warren P. Adams,et al.  A hierarchy of relaxation between the continuous and convex hull representations , 1990 .

[9]  Hanif D. Sherali,et al.  A Hierarchy of Relaxations Between the Continuous and Convex Hull Representations for Zero-One Programming Problems , 1990, SIAM J. Discret. Math..

[10]  Jerry Ray Dias,et al.  Chemical Applications of Graph Theory , 1992 .

[11]  Marie-Françoise Roy,et al.  Real algebraic geometry , 1992 .

[12]  C. Stein,et al.  Estimation with Quadratic Loss , 1992 .

[13]  G. Ziegler Lectures on Polytopes , 1994 .

[14]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[15]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[16]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[17]  Dana Ron,et al.  Computational sample complexity , 1997, COLT '97.

[18]  Daniel A. Klain,et al.  Introduction to Geometric Probability , 1997 .

[19]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[20]  Miklós Simonovits,et al.  Approximation of diameters: randomization doesn't help , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[21]  N. Alon,et al.  Finding a large hidden clique in a random graph , 1998 .

[22]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[23]  M. Kraetzl,et al.  Detection of abnormal change in dynamic networks , 1999, 1999 Information, Decision and Control. Data and Information Fusion Symposium, Signal Processing and Communications Symposium and Decision and Control Symposium. Proceedings (Cat. No.99EX251).

[24]  Rocco A. Servedio Computational sample complexity and attribute-efficient learning , 1999, STOC '99.

[25]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[26]  U. Feige,et al.  Finding and certifying a large hidden clique in a semirandom graph , 2000, Random Struct. Algorithms.

[27]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[28]  M. Ledoux The concentration of measure phenomenon , 2001 .

[29]  Inchi Hu,et al.  Invited Discussion of "Sequential Analysis: Some Classical Problems and New Challenges , 2001 .

[30]  Tamara G. Kolda,et al.  Orthogonal Tensor Decompositions , 2000, SIAM J. Matrix Anal. Appl..

[31]  T. Lai SEQUENTIAL ANALYSIS: SOME CLASSICAL PROBLEMS AND NEW CHALLENGES , 2001 .

[32]  Jean B. Lasserre,et al.  Global Optimization with Polynomials and the Problem of Moments , 2000, SIAM J. Optim..

[33]  N. Higham Computing the nearest correlation matrix—a problem from finance , 2002 .

[34]  Vijay V. Vazirani,et al.  Approximation Algorithms , 2001, Springer Berlin Heidelberg.

[35]  Pablo A. Parrilo,et al.  Semidefinite programming relaxations for semialgebraic problems , 2003, Math. Program..

[36]  Adi Shraibman,et al.  Rank, Trace-Norm and Max-Norm , 2005, COLT.

[37]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[38]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[39]  James Renegar,et al.  Hyperbolic Programs, and Their Derivative Relaxations , 2006, Found. Comput. Math..

[40]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[41]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[42]  Nathan Srebro,et al.  SVM optimization: inverse dependence on training set size , 2008, ICML '08.

[43]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[44]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[45]  Sanjeev Arora,et al.  Computational Complexity: A Modern Approach , 2009 .

[46]  Rekha R. Thomas,et al.  Theta Bodies for Polynomial Ideals , 2008, SIAM J. Optim..

[47]  Michael T. Hallett,et al.  A Trade-Off between Sample Complexity and Computational Complexity in Learning Boolean Networks from Time-Series Data , 2010, IEEE ACM Trans. Comput. Biol. Bioinform..

[48]  Stephen A. Vavasis,et al.  Nuclear norm minimization for the planted clique and biclique problems , 2009, Math. Program..

[49]  Sivaraman Balakrishnan,et al.  Minimax Localization of Structural Information in Large Noisy Matrices , 2011, NIPS.

[50]  Michael Langberg,et al.  A unified framework for approximating and clustering data , 2011, STOC '11.

[51]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[52]  Shai Shalev-Shwartz,et al.  Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs , 2012, NIPS.

[53]  Ohad Shamir,et al.  Using More Data to Speed-up Training Time , 2011, AISTATS.

[54]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[55]  Peter L. Bartlett,et al.  Oracle inequalities for computationally adaptive model selection , 2012, ArXiv.

[56]  S. Mahadevan,et al.  Learning Theory , 2001 .

[57]  Rekha R. Thomas,et al.  Lifts of Convex Sets and Cone Factorizations , 2011, Math. Oper. Res..

[58]  Glenn Stone Statistics for High‐Dimensional Data: Methods, Theory and Applications. By Peter Buhlmann and Sara van de Geer. Springer, Berlin, Heidelberg. 2011. xvii+556 pages. €104.99 (hardback). ISBN 978‐3‐642‐20191‐2. , 2013 .

[59]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[60]  Michel X. Goemans,et al.  Smallest compact formulation for the permutahedron , 2015, Math. Program..

[61]  Statistica Sinica , .