High-dimensional change-point estimation: Combining filtering with convex optimization

We consider change-point estimation in a sequence of high-dimensional signals given noisy observations. Classical approaches to this problem such as the filtered derivative method are useful for sequences of scalar-valued signals, but they have undesirable scaling behavior in the high-dimensional setting. However, many high-dimensional signals encountered in practice frequently possess latent low-dimensional structure. Motivated by this observation, we propose a technique for high-dimensional change-point estimation that combines the filtered derivative approach from previous work with convex optimization methods based on atomic norm regularization, which are useful for exploiting structure in high-dimensional data. Our algorithm is applicable in online settings as it operates on small portions of the sequence of observations at a time, and it is well-suited to the high-dimensional setting both in terms of computational scalability and of statistical efficiency. The main result of this paper shows that our method performs change-point estimation reliably as long as the product of the smallest-sized change (the Euclidean-norm-squared of the difference between signals at a change-point) and the smallest distance between change-points (number of time instances) is larger than a Gaussian width parameter that characterizes the low-dimensional complexity of the underlying signal sequence. A full version of this paper is available online [1].

[1]  Arnaud Guillin,et al.  Off-Line Detection of Multiple Change Points by the Filtered Derivative with p-Value Method , 2010, 1003.4148.

[2]  Jaromír Antoch,et al.  Procedures for the Detection of Multiple Changes in Series of Independent Observations , 1994 .

[3]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[4]  Rocco A. Servedio Computational Sample Complexity and Attribute-Efficient Learning , 2000, J. Comput. Syst. Sci..

[5]  Warren P. Adams,et al.  A hierarchy of relaxation between the continuous and convex hull representations , 1990 .

[6]  John D. Lafferty,et al.  Computation-Risk Tradeoffs for Covariance-Thresholded Regression , 2013, ICML.

[7]  L. Jones A Simple Lemma on Greedy Approximation in Hilbert Space and Convergence Rates for Projection Pursuit Regression and Neural Network Training , 1992 .

[8]  Sivaraman Balakrishnan,et al.  Minimax Localization of Structural Information in Large Noisy Matrices , 2011, NIPS.

[9]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[10]  A. Shiryaev On Optimum Methods in Quickest Detection Problems , 1963 .

[11]  Devavrat Shah,et al.  Inferring Rankings Using Constrained Sensing , 2009, IEEE Transactions on Information Theory.

[12]  Z. Harchaoui,et al.  Multiple Change-Point Estimation With a Total Variation Penalty , 2010 .

[13]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[14]  Christopher J. Hillar,et al.  Most Tensor Problems Are NP-Hard , 2009, JACM.

[15]  Benjamin Recht,et al.  Probability of unique integer solution to a system of linear equations , 2011, Eur. J. Oper. Res..

[16]  Babak Hassibi,et al.  Tight recovery thresholds and robustness analysis for nuclear norm minimization , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[17]  R. Rockafellar Convex Analysis: (pms-28) , 1970 .

[18]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[19]  Rekha R. Thomas,et al.  Theta Bodies for Polynomial Ideals , 2008, SIAM J. Optim..

[20]  Rina Foygel,et al.  Corrupted Sensing: Novel Guarantees for Separating Structured Signals , 2013, IEEE Transactions on Information Theory.

[21]  Andrew R. Barron,et al.  Universal approximation bounds for superpositions of a sigmoidal function , 1993, IEEE Trans. Inf. Theory.

[22]  M. Rudelson,et al.  Sparse reconstruction by convex relaxation: Fourier and Gaussian measurements , 2006, 2006 40th Annual Conference on Information Sciences and Systems.

[23]  Hanif D. Sherali,et al.  A Hierarchy of Relaxations Between the Continuous and Convex Hull Representations for Zero-One Programming Problems , 1990, SIAM J. Discret. Math..

[24]  Michèle Basseville,et al.  Detection of abrupt changes: theory and application , 1993 .

[25]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[26]  G. Minty On the monotonicity of the gradient of a convex function. , 1964 .

[27]  Farida Enikeeva,et al.  High-dimensional change-point detection with sparse alternatives , 2013, 1312.1900.

[28]  Tamara G. Kolda,et al.  Tensor Decompositions and Applications , 2009, SIAM Rev..

[29]  J. Moreau Proximité et dualité dans un espace hilbertien , 1965 .

[30]  Piotr Fryzlewicz,et al.  Multiple‐change‐point detection for high dimensional time series via sparsified binary segmentation , 2015, 1611.08639.

[31]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[32]  Peter L. Bartlett,et al.  Oracle inequalities for computationally adaptive model selection , 2012, ArXiv.

[33]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[34]  G. Simons Great Expectations: Theory of Optimal Stopping , 1973 .

[35]  Michèle Basseville,et al.  Detection of Abrupt Changes: Theory and Applications. , 1995 .

[36]  Ohad Shamir,et al.  Using More Data to Speed-up Training Time , 2011, AISTATS.

[37]  Weiyu Xu,et al.  Null space conditions and thresholds for rank minimization , 2011, Math. Program..

[38]  Pierre Bertrand A local method for estimating change points: the “Hat-function” , 2000 .

[39]  E. Candès,et al.  Near-ideal model selection by ℓ1 minimization , 2008, 0801.0345.

[40]  David L. Donoho,et al.  Optimal Shrinkage of Singular Values , 2014, IEEE Transactions on Information Theory.

[41]  P. Rigollet,et al.  Optimal detection of sparse principal components in high dimension , 2012, 1202.5070.

[42]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[43]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[44]  Rebecca Willett,et al.  Change-Point Detection for High-Dimensional Time Series With Missing Data , 2012, IEEE Journal of Selected Topics in Signal Processing.

[45]  Piotr Fryzlewicz,et al.  Wild binary segmentation for multiple change-point detection , 2014, 1411.0858.

[46]  Venkat Chandrasekaran,et al.  High-dimensional change-point estimation: Combining filtering with convex optimization , 2014, 2015 IEEE International Symposium on Information Theory (ISIT).

[47]  Dana Ron,et al.  Computational Sample Complexity , 1999, SIAM J. Comput..

[48]  G. Lorden PROCEDURES FOR REACTING TO A CHANGE IN DISTRIBUTION , 1971 .

[49]  N. Higham Computing the nearest correlation matrix—a problem from finance , 2002 .

[50]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[51]  Ronald A. DeVore,et al.  Some remarks on greedy algorithms , 1996, Adv. Comput. Math..

[52]  Jean B. Lasserre,et al.  Global Optimization with Polynomials and the Problem of Moments , 2000, SIAM J. Optim..

[53]  Mihailo Stojnic,et al.  Various thresholds for ℓ1-optimization in compressed sensing , 2009, ArXiv.

[54]  Badri Narayan Bhaskar,et al.  Compressed Sensing o the Grid , 2013 .

[55]  Babak Hassibi,et al.  On a relation between the minimax risk and the phase transitions of compressed recovery , 2012, 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[56]  James Renegar,et al.  Hyperbolic Programs, and Their Derivative Relaxations , 2006, Found. Comput. Math..

[57]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[58]  E. S. Page CONTINUOUS INSPECTION SCHEMES , 1954 .

[59]  Taposh Banerjee,et al.  Quickest Change Detection , 2012, ArXiv.

[60]  Parikshit Shah,et al.  Compressed Sensing Off the Grid , 2012, IEEE Transactions on Information Theory.

[61]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[62]  Joel A. Tropp,et al.  Living on the edge: phase transitions in convex programs with random data , 2013, 1303.6672.

[63]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[64]  Claudia Kirch,et al.  Change Points in High Dimensional Settings , 2014 .

[65]  Shai Shalev-Shwartz,et al.  Learning Halfspaces with the Zero-One Loss: Time-Accuracy Tradeoffs , 2012, NIPS.

[66]  G. Pisier Remarques sur un résultat non publié de B. Maurey , 1981 .

[67]  Y. Gordon On Milman's inequality and random subspaces which escape through a mesh in ℝ n , 1988 .

[68]  Gongguo Tang,et al.  Atomic Norm Denoising With Applications to Line Spectral Estimation , 2012, IEEE Transactions on Signal Processing.

[69]  M. Ledoux The concentration of measure phenomenon , 2001 .

[70]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[71]  A. Benveniste,et al.  Detection of abrupt changes in signals and dynamical systems : Some statistical aspects , 1984 .

[72]  P. Bickel,et al.  Regularized estimation of large covariance matrices , 2008, 0803.1909.

[73]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..