Rigorous Guarantees for Tyler's M-estimator via quantum expansion

Estimating the shape of an elliptical distribution is a fundamental problem in statistics. One estimator for the shape matrix, Tyler's M-estimator, has been shown to have many appealing asymptotic properties. It performs well in numerical experiments and can be quickly computed in practice by a simple iterative procedure. Despite the many years the estimator has been studied in the statistics community, there was neither a non-asymptotic bound on the rate of the estimator nor a proof that the iterative procedure converges in polynomially many steps. Here we observe a surprising connection between Tyler's M-estimator and operator scaling, which has been intensively studied in recent years in part because of its connections to the Brascamp-Lieb inequality in analysis. We use this connection, together with novel results on quantum expanders, to show that Tyler's M-estimator has the optimal rate up to factors logarithmic in the dimension, and that in the generative model the iterative procedure has a linear convergence rate even without regularization.

[1]  Peter Bürgisser,et al.  Alternating minimization, scaling algorithms, and the null-cone problem from invariant theory , 2017, ITCS.

[2]  Avi Wigderson,et al.  Algorithmic and optimization aspects of Brascamp-Lieb inequalities, via Operator Scaling , 2016, Geometric and Functional Analysis.

[3]  Lap Chi Lau,et al.  Spectral Analysis of Matrix Scaling and Operator Scaling , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[4]  Ankur Moitra,et al.  Algorithms and Hardness for Robust Subspace Recovery , 2012, COLT.

[5]  S. Barsov,et al.  Estimates of the proximity of Gaussian measures , 1987 .

[6]  Jürgen Forster,et al.  A linear lower bound on the unbounded error probabilistic communication complexity , 2001, Proceedings 16th Annual IEEE Conference on Computational Complexity.

[7]  Nisheeth K. Vishnoi,et al.  Computing Maximum Entropy Distributions Everywhere , 2017, ArXiv.

[8]  F. Barthe On a reverse form of the Brascamp-Lieb inequality , 1997, math/9705210.

[9]  Philip A. Knight,et al.  The Sinkhorn-Knopp Algorithm: Convergence and Applications , 2008, SIAM J. Matrix Anal. Appl..

[10]  Leonid Gurvits Classical deterministic complexity of Edmonds' Problem and quantum entanglement , 2003, STOC '03.

[11]  Alexander Barg,et al.  Bounds on packings of spheres in the Grassmann manifold , 2002, IEEE Trans. Inf. Theory.

[12]  David E. Tyler A Distribution-Free $M$-Estimator of Multivariate Scatter , 1987 .

[13]  G. Simons,et al.  On the theory of elliptically contoured distributions , 1981 .

[14]  Alex Samorodnitsky,et al.  A Deterministic Algorithm for Approximating the Mixed Discriminant and Mixed Volume, and a Combinatorial Corollary , 2002, Discret. Comput. Geom..

[15]  Martin J. Wainwright,et al.  High-Dimensional Statistics , 2019 .

[16]  Avi Wigderson,et al.  A Deterministic Polynomial Time Algorithm for Non-commutative Rational Identity Testing , 2015, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[17]  B. Nadler,et al.  Robust sparse covariance estimation by thresholding Tyler’s M-estimator , 2017, The Annals of Statistics.

[18]  Avi Wigderson,et al.  Operator scaling via geodesically convex optimization, invariant theory and polynomial identity testing , 2018, STOC.

[19]  Avi Wigderson,et al.  Operator Scaling: Theory and Applications , 2015, Found. Comput. Math..

[20]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[21]  Peter Bürgisser,et al.  Towards a Theory of Non-Commutative Optimization: Geodesic 1st and 2nd Order Methods for Moment Maps and Polytopes , 2019, 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS).

[22]  Ami Wiesel,et al.  Performance Analysis of Tyler's Covariance Estimator , 2014, IEEE Transactions on Signal Processing.

[23]  Douglas Kelker,et al.  DISTRIBUTION THEORY OF SPHERICAL DISTRIBUTIONS AND A LOCATION-SCALE PARAMETER GENERALIZATION , 2016 .

[24]  George W. Soules The rate of convergence of Sinkhorn balancing , 1991 .

[25]  Ami Wiesel,et al.  Structured Robust Covariance Estimation , 2015, Found. Trends Signal Process..

[26]  J. Arbel,et al.  On the sub-Gaussianity of the Beta and Dirichlet distributions , 2017, 1705.00048.

[27]  A. King MODULI OF REPRESENTATIONS OF FINITE DIMENSIONAL ALGEBRAS , 1994 .

[28]  Taras Bodnar,et al.  Elliptically Contoured Models in Statistics and Portfolio Theory , 2013 .

[29]  Avi Wigderson,et al.  Efficient Algorithms for Tensor Scaling, Quantum Marginals, and Moment Polytopes , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).