论文信息 - Principled approaches to robust machine learning and beyond

Principled approaches to robust machine learning and beyond

As we apply machine learning to more and more important tasks, it becomes increasingly important that these algorithms are robust to systematic, or worse, malicious, noise. Despite considerable interest, no efficient algorithms were known to be robust to such noise in high dimensional settings for some of the most fundamental statistical tasks for over sixty years of research. In this thesis we devise two novel, but similarly inspired, algorithmic paradigms for estimation in high dimensions in the presence of a small number of adversarially added data points. Both algorithms are the first efficient algorithms which achieve (nearly) optimal error bounds for a number fundamental statistical tasks such as mean estimation and covariance estimation. The goal of this thesis is to present these two frameworks in a clean and unified manner. We show that these insights also have applications for other problems in learning theory. Specifically, we show that these algorithms can be combined with the powerful Sum-of-Squares hierarchy to yield improvements for clustering high dimensional Gaussian mixture models, the first such improvement in over fifteen years of research. Going full circle, we show that Sum-of-Squares also can be used to improve error rates for robust mean estimation. Not only are these algorithms of interest theoretically, but we demonstrate empirically that we can use these insights in practice to uncover patterns in high dimensional data that were previously masked by noise. Based on our algorithms, we give new implementations for robust PCA, new defenses for data poisoning attacks for stochastic optimization, and new defenses for watermarking attacks on deep nets. In all of these tasks, we demonstrate on both synthetic and real data sets that our performance is substantially better than the state-of-the-art, often able to detect most to all corruptions when previous methods could not reliably detect any. Thesis Supervisor: Ankur Moitra Title: Rockwell International CD Associate Professor of Mathematics

Jerry Zheng Li | J. Li

[1] Alexandre d'Aspremont,et al. Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[2] Prasad Raghavendra,et al. The Power of Sum-of-Squares for Detecting Hidden Structures , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[3] Sanjeev Arora,et al. LEARNING MIXTURES OF SEPARATED NONSPHERICAL GAUSSIANS , 2005, math/0503457.

[4] Tselil Schramm,et al. Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors , 2015, STOC.

[5] Michael I. Jordan,et al. A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, NIPS 2004.

[6] Dan Alistarh,et al. The Power of Choice in Priority Scheduling , 2017, PODC.

[7] Christos Tzamos,et al. Ten Steps of EM Suffice for Mixtures of Two Gaussians , 2016, COLT.

[8] I. Johnstone. On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[9] Sanjeev Arora,et al. Learning mixtures of arbitrary gaussians , 2001, STOC '01.

[10] Aravindan Vijayaraghavan,et al. On Learning Mixtures of Well-Separated Gaussians , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[11] Micah Sherr,et al. Hidden Voice Commands , 2016, USENIX Security Symposium.

[12] Sivaraman Balakrishnan,et al. Computationally Efficient Robust Estimation of Sparse Functionals , 2017, ArXiv.

[13] Yuan Zhou,et al. Approximability and proof complexity , 2012, SODA.

[14] Jerry Li,et al. Computationally Efficient Robust Sparse Estimation in High Dimensions , 2017, COLT.

[15] Crina Grosan,et al. Meta-QSAR: a large-scale application of meta-learning to drug design and discovery , 2017, Machine Learning.

[16] Sivaraman Balakrishnan,et al. Robust estimation via robust gradient estimation , 2018, Journal of the Royal Statistical Society: Series B (Statistical Methodology).

[17] Pablo A. Parrilo,et al. The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[18] Cristina Nita-Rotaru,et al. On the Practicality of Integrity Attacks on Document-Level Sentiment Analysis , 2014, AISec '14.

[19] Tselil Schramm,et al. Fast and robust tensor decomposition with applications to dictionary learning , 2017, COLT.

[20] Martin J. Wainwright,et al. Statistical guarantees for the EM algorithm: From population to sample-based analysis , 2014, ArXiv.

[21] Santosh S. Vempala,et al. A spectral algorithm for learning mixtures of distributions , 2002, The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings..

[22] Robert C. Bolles,et al. Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[23] B. Nadler,et al. MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA. , 2012, Annals of statistics.

[24] David P. Woodruff,et al. Communication lower bounds for statistical estimation problems via a distributed data processing inequality , 2015, STOC.

[25] Rocco A. Servedio,et al. Learning Halfspaces with Malicious Noise , 2009, ICALP.

[26] Ming Li,et al. Learning in the presence of malicious errors , 1993, STOC '88.

[27] Atul Prakash,et al. Robust Physical-World Attacks on Machine Learning Models , 2017, ArXiv.

[28] A. F. Smith,et al. Statistical analysis of finite mixture distributions , 1986 .

[29] S. Charles Brubaker,et al. Extensions of principal components analysis , 2009 .

[30] E.J. Candes,et al. An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[31] Aditya Bhaskara,et al. Smoothed analysis of tensor decompositions , 2013, STOC.

[32] Santosh S. Vempala,et al. Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[33] Constantine Caramanis,et al. Robust PCA via Outlier Pursuit , 2010, IEEE Transactions on Information Theory.

[34] Alexandre B. Tsybakov,et al. Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[35] Rocco A. Servedio,et al. Learning mixtures of structured distributions over discrete domains , 2012, SODA.

[36] Dan Alistarh,et al. QSGD: Communication-Optimal Stochastic Gradient Descent, with Applications to Training Neural Networks , 2016, 1610.02132.

[37] Daniel M. Kane,et al. Learning geometric concepts with nasty noise , 2017, STOC.

[38] Petros Drineas,et al. Ancestry informative markers for fine-scale individual assignment to worldwide populations , 2010, Journal of Medical Genetics.

[39] Amit Kumar,et al. Clustering with Spectral Norm and the k-Means Algorithm , 2010, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[40] Harrison H. Zhou,et al. Estimating structured high-dimensional covariance and precision matrices: Optimal rates and adaptive estimation , 2016 .

[41] J. Jewkes,et al. Theory of Location of Industries. , 1933 .

[42] P. Massart,et al. Adaptive estimation of a quadratic functional by model selection , 2000 .

[43] Yurii Nesterov,et al. Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[44] Jerry Li,et al. Spectral Signatures in Backdoor Attacks , 2018, NeurIPS.