ReLU Regression with Massart Noise

We study the fundamental problem of ReLU regression, where the goal is to fit Rectified Linear Units (ReLUs) to data. This supervised learning task is efficiently solvable in the realizable setting, but is known to be computationally hard with adversarial label noise. In this work, we focus on ReLU regression in the Massart noise model, a natural and well-studied semi-random noise model. In this model, the label of every point is generated according to a function in the class, but an adversary is allowed to change this value arbitrarily with some probability, which is at most ⌘ < 1/2. We develop an efficient algorithm that achieves exact parameter recovery in this model under mild anti-concentration assumptions on the underlying distribution. Such assumptions are necessary for exact recovery to be informationtheoretically possible. We demonstrate that our algorithm significantly outperforms naive applications of `1 and `2 regression on both synthetic and real data.

[1]  Gilad Yehudai,et al.  Learning a Single Neuron with Gradient Methods , 2020, COLT 2020.

[2]  Varun Kanade,et al.  Reliably Learning the ReLU in Polynomial Time , 2016, COLT.

[3]  Daniel M. Kane,et al.  The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals , 2021, COLT.

[4]  Daniel M. Kane,et al.  Hardness of Learning Halfspaces with Massart Noise , 2020, ArXiv.

[5]  Ilias Diakonikolas,et al.  Approximation Schemes for ReLU Regression , 2020, COLT.

[6]  Anirbit Mukherjee,et al.  A Study of Neural Training with Iterative Non-Gradient Methods , 2020 .

[7]  Daniel M. Kane,et al.  Threshold Phenomena in Learning Halfspaces with Massart Noise , 2021, ArXiv.

[8]  Ankur Moitra,et al.  Classification Under Misspecification: Halfspaces, Generalized Linear Models, and Evolvability , 2020, NeurIPS.

[9]  Yuchen Zhang,et al.  A Hitting Time Analysis of Stochastic Gradient Langevin Dynamics , 2017, COLT.

[10]  Percy Liang,et al.  Certified Defenses for Data Poisoning Attacks , 2017, NIPS.

[11]  Prateek Jain,et al.  Robust Regression via Hard Thresholding , 2015, NIPS.

[12]  Santosh S. Vempala,et al.  Agnostic Estimation of Mean and Covariance , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[13]  Ankur Moitra,et al.  Algorithms and Hardness for Robust Subspace Recovery , 2012, COLT.

[14]  Scott Pesme,et al.  Online Robust Regression via SGD on the l1 loss , 2020, NeurIPS.

[15]  Pranjal Awasthi,et al.  Efficient active learning of sparse halfspaces with arbitrary bounded noise , 2020, NeurIPS.

[16]  P. Massart,et al.  Risk bounds for statistical learning , 2007, math/0702683.

[17]  Shachar Lovett,et al.  Point Location and Active Learning: Learning Halfspaces Almost Optimally , 2020, 2020 IEEE 61st Annual Symposium on Foundations of Computer Science (FOCS).

[18]  M. Feldman,et al.  Genetic Structure of Human Populations , 2002, Science.

[19]  Christos Tzamos,et al.  Learning Halfspaces with Massart Noise Under Structured Distributions , 2020, COLT 2020.

[20]  Santosh S. Vempala,et al.  Optimal outlier removal in high-dimensional spaces , 2004, J. Comput. Syst. Sci..

[21]  Blaine Nelson,et al.  The security of machine learning , 2010, Machine Learning.

[22]  David Steurer,et al.  Regress Consistently when Oblivious Outliers Overwhelm , 2020, ArXiv.

[23]  Amir Salman Avestimehr,et al.  Fitting ReLUs via SGD and Quantized SGD , 2019, 2019 IEEE International Symposium on Information Theory (ISIT).

[24]  Adam R. Klivans,et al.  Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals , 2019, NeurIPS.

[25]  Crina Grosan,et al.  Meta-QSAR: a large-scale application of meta-learning to drug design and discovery , 2017, Machine Learning.

[26]  Daniel M. Kane,et al.  Learning general halfspaces with general Massart noise under the Gaussian distribution , 2021, STOC.

[27]  Daniel M. Kane,et al.  Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals , 2020, NeurIPS.

[28]  Gilad Yehudai,et al.  On the Power and Limitations of Random Features for Understanding Neural Networks , 2019, NeurIPS.

[29]  Ankur Moitra,et al.  Online and Distribution-Free Robustness: Regression and Contextual Bandits with Huber Contamination , 2020, 2021 IEEE 62nd Annual Symposium on Foundations of Computer Science (FOCS).

[30]  Petros Drineas,et al.  Ancestry informative markers for fine-scale individual assignment to worldwide populations , 2010, Journal of Medical Genetics.

[31]  Prateek Jain,et al.  Consistent Robust Regression , 2017, NIPS.

[32]  Leonid Khachiyan,et al.  On the Complexity of Approximating Extremal Determinants in Matrices , 1995, J. Complex..

[33]  Mahdi Soltanolkotabi,et al.  Learning ReLUs via Gradient Descent , 2017, NIPS.

[34]  Daniel M. Kane,et al.  Recent Advances in Algorithmic High-Dimensional Robust Statistics , 2019, ArXiv.

[35]  Maria-Florina Balcan,et al.  Learning and 1-bit Compressed Sensing under Asymmetric Noise , 2016, COLT.

[36]  David Steurer,et al.  Consistent regression when oblivious outliers overwhelm , 2020, ICML.

[37]  Yuandong Tian,et al.  When is a Convolutional Filter Easy To Learn? , 2017, ICLR.

[38]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[39]  Haim Kaplan,et al.  On Radial Isotropic Position: Theory and Algorithms , 2020, ArXiv.

[40]  M. Feldman,et al.  Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation , 2008 .

[41]  Danna Zhou,et al.  d. , 1840, Microbial pathogenesis.

[42]  Jerry Li,et al.  Sever: A Robust Meta-Algorithm for Stochastic Optimization , 2018, ICML.

[43]  Ilias Diakonikolas,et al.  Efficient Algorithms and Lower Bounds for Robust Linear Regression , 2018, SODA.

[44]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[45]  Henry W. Altland,et al.  Applied Regression Analysis for Business and Economics , 2002, Technometrics.

[46]  Pasin Manurangsi,et al.  The Computational Complexity of Training ReLU(s) , 2018, ArXiv.

[47]  Daniel M. Kane,et al.  Forster Decomposition and Learning Halfspaces with Noise , 2021, NeurIPS.

[48]  B. Ripley,et al.  Robust Statistics , 2018, Encyclopedia of Mathematical Geosciences.

[49]  Pravesh Kothari,et al.  Efficient Algorithms for Outlier-Robust Regression , 2018, COLT.

[50]  Adam R. Klivans,et al.  Statistical-Query Lower Bounds via Functional Gradients , 2020, NeurIPS.

[51]  Daniel M. Kane,et al.  Robust Estimators in High Dimensions without the Computational Intractability , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[52]  Pradeep Ravikumar,et al.  Adaptive Hard Thresholding for Near-optimal Consistent Robust Regression , 2019, COLT.

[53]  Maria-Florina Balcan,et al.  Efficient Learning of Linear Separators under Bounded Noise , 2015, COLT.

[54]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[55]  Christos Tzamos,et al.  Boosting in the Presence of Massart Noise , 2021, COLT.

[56]  Blaine Nelson,et al.  Poisoning Attacks against Support Vector Machines , 2012, ICML.

[57]  Christos Tzamos,et al.  Distribution-Independent PAC Learning of Halfspaces with Massart Noise , 2019, NeurIPS.