On the Cryptographic Hardness of Learning Single Periodic Neurons

We show a simple reduction which demonstrates the cryptographic hardness of learning a single periodic neuron over isotropic Gaussian distributions in the presence of noise. More precisely, our reduction shows that any polynomial-time algorithm (not necessarily gradient-based) for learning such functions under small noise implies a polynomial-time quantum algorithm for solving worst-case lattice problems, whose hardness form the foundation of lattice-based cryptography. Our core hard family of functions, which are well-approximated by one-layer neural networks, take the general form of a univariate periodic function applied to an affine projection of the data. These functions have appeared in previous seminal works which demonstrate their hardness against gradient-based (Shamir’18), and Statistical Query (SQ) algorithms (Song et al.’17). We show that if (polynomially) small noise is added to the labels, the intractability of learning these functions applies to all polynomial-time algorithms, beyond gradient-based and SQ algorithms, under the aforementioned cryptographic assumptions. Moreover, we demonstrate the necessity of noise in the hardness result by designing a polynomial-time algorithm for learning certain families of such functions under exponentially small adversarial noise. Our proposed algorithm is not a gradient-based or an SQ algorithm, but is rather based on the celebrated Lenstra-Lenstra-Lovász (LLL) lattice basis reduction algorithm. Furthermore, in the absence of noise, this algorithm can be directly applied to solve CLWE detection (Bruna et al.’21) and phase retrieval with an optimal sample complexity of d + 1 samples. In the former case, this improves upon the quadratic-in-d sample complexity required in (Bruna et al.’21).

[1]  M. Rudelson,et al.  Non-asymptotic theory of random matrices: extreme singular values , 2010, 1003.2990.

[2]  Yishay Mansour,et al.  Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[3]  Ohad Shamir,et al.  Distribution-Specific Hardness of Learning Neural Networks , 2016, J. Mach. Learn. Res..

[4]  Adam R. Klivans,et al.  Approximation Schemes for ReLU Regression , 2020, COLT.

[5]  Alexander Kmentt 2017 , 2018, The Treaty Prohibiting Nuclear Weapons.

[6]  Tengyu Ma,et al.  Learning One-hidden-layer Neural Networks with Landscape Design , 2017, ICLR.

[7]  Denis Simon,et al.  Selected Applications of LLL in Number Theory , 2010, The LLL Algorithm.

[8]  Quanquan Gu,et al.  Agnostic Learning of a Single Neuron with Gradient Descent , 2020, NeurIPS.

[9]  G. B. Arous,et al.  Online stochastic gradient descent on non-convex losses from high-dimensional inference , 2020, J. Mach. Learn. Res..

[10]  Adam R. Klivans,et al.  Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals , 2019, NeurIPS.

[11]  Florent Krzakala,et al.  Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval , 2020, NeurIPS.

[12]  Shafi Goldwasser,et al.  Complexity of lattice problems - a cryptographic perspective , 2002, The Kluwer international series in engineering and computer science.

[13]  C. P. Schnorr,et al.  A Hierarchy of Polynomial Time Lattice Basis Reduction Algorithms , 1987, Theor. Comput. Sci..

[14]  Sundeep Rangan,et al.  Generalized approximate message passing for estimation with random linear mixing , 2010, 2011 IEEE International Symposium on Information Theory Proceedings.

[15]  Daniele Micciancio,et al.  Practical, Predictable Lattice Basis Reduction , 2016, EUROCRYPT.

[16]  Le Song,et al.  On the Complexity of Learning Neural Networks , 2017, NIPS.

[17]  Noah Stephens-Davidowitz On the Gaussian Measure Over Lattices , 2017 .

[18]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[19]  Pierfrancesco Urbani,et al.  Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem , 2021, Mach. Learn. Sci. Technol..

[20]  W. Ebeling Lattices and Codes: A Course Partially Based on Lectures by Friedrich Hirzebruch , 1994 .

[21]  M. Varacallo,et al.  2019 , 2019, Journal of Surgical Orthopaedic Advances.

[22]  Alexandr Andoni,et al.  Correspondence retrieval , 2017, COLT.

[23]  Adam Tauman Kalai,et al.  Efficient Learning of Generalized Linear and Single Index Models with Isotonic Regression , 2011, NIPS.

[24]  Mark Jerrum,et al.  Large Cliques Elude the Metropolis Process , 1992, Random Struct. Algorithms.

[25]  J R Fienup,et al.  Phase retrieval algorithms: a comparison. , 1982, Applied optics.

[26]  Amit Daniely,et al.  From Local Pseudorandom Generators to Hardness of Learning , 2021, COLT.

[27]  Balázs Szörényi Characterizing Statistical Query Learning: Simplified Notions and Proofs , 2009, ALT.

[28]  R. Balan,et al.  On signal reconstruction without phase , 2006 .

[29]  G. A. Young,et al.  High‐dimensional Statistics: A Non‐asymptotic Viewpoint, Martin J.Wainwright, Cambridge University Press, 2019, xvii 552 pages, £57.99, hardback ISBN: 978‐1‐1084‐9802‐9 , 2020, International Statistical Review.

[30]  Andrea Montanari,et al.  Message-passing algorithms for compressed sensing , 2009, Proceedings of the National Academy of Sciences.

[31]  Emmanuel J. Candès,et al.  PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , 2011, ArXiv.

[32]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[33]  Nicolas Gama,et al.  Finding short lattice vectors within mordell's inequality , 2008, STOC.

[34]  Ravi Kannan,et al.  Improved algorithms for integer programming and related lattice problems , 1983, STOC.

[35]  Yuanzhi Li,et al.  Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers , 2018, NeurIPS.

[36]  David Gamarnik,et al.  Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm , 2017, 1711.04952.

[37]  Oded Regev,et al.  On lattices, learning with errors, random linear codes, and cryptography , 2005, STOC '05.

[38]  László Lovász,et al.  Factoring polynomials with rational coefficients , 1982 .

[39]  Yonina C. Eldar,et al.  Phase Retrieval: An Overview of Recent Developments , 2015, ArXiv.

[40]  Oded Regev,et al.  Tensor-based hardness of the shortest vector problem to within almost polynomial factors , 2007, STOC '07.

[41]  Santosh S. Vempala,et al.  Statistical Algorithms and a Lower Bound for Detecting Planted Cliques , 2012, J. ACM.

[42]  Noga Alon,et al.  Finding a large hidden clique in a random graph , 1998, SODA '98.

[43]  Damien Stehlé,et al.  An LLL Algorithm with Quadratic Complexity , 2009, SIAM J. Comput..

[44]  Dorit Aharonov,et al.  Lattice problems in NP ∩ coNP , 2005, JACM.

[45]  Daniele Micciancio Lattice-Based Cryptography , 2011, Encyclopedia of Cryptography and Security.

[46]  Marian Kremers 2021 , 2021, Vakblad Sociaal Werk.

[47]  Afonso S. Bandeira,et al.  Notes on computational-to-statistical gaps: predictions using statistical physics , 2018, Portugaliae Mathematica.

[48]  Afonso S. Bandeira,et al.  Notes on Computational Hardness of Hypothesis Testing: Predictions using the Low-Degree Likelihood Ratio , 2019, ArXiv.

[49]  Ryan O'Donnell,et al.  Noise stability of functions with low influences: Invariance and optimality , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[50]  Michael Kharitonov,et al.  Cryptographic hardness of distribution-specific learning , 1993, STOC.

[51]  Alan M. Frieze,et al.  On the Lagarias-Odlyzko Algorithm for the Subset Sum Problem , 1986, SIAM J. Comput..

[52]  Subhash Khot,et al.  Hardness of approximating the shortest vector problem in lattices , 2004, 45th Annual IEEE Symposium on Foundations of Computer Science.

[53]  Adi Shamir,et al.  A polynomial time algorithm for breaking the basic Merkle-Hellman cryptosystem , 1984, 23rd Annual Symposium on Foundations of Computer Science (sfcs 1982).

[54]  Mahdi Soltanolkotabi,et al.  Learning ReLUs via Gradient Descent , 2017, NIPS.

[55]  Adam R. Klivans,et al.  Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent , 2020, ICML.

[56]  Ohad Shamir,et al.  Failures of Gradient-Based Deep Learning , 2017, ICML.

[57]  Nicolas Macris,et al.  Optimal errors and phase transitions in high-dimensional generalized linear models , 2017, Proceedings of the National Academy of Sciences.

[58]  Inderjit S. Dhillon,et al.  Recovery Guarantees for One-hidden-layer Neural Networks , 2017, ICML.

[59]  O. Regev,et al.  Continuous LWE , 2020, Electron. Colloquium Comput. Complex..

[60]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[61]  David Swanson,et al.  The co-area formula for Sobolev mappings , 2001 .

[62]  Florent Krzakala,et al.  Phase retrieval in high dimensions: Statistical and computational phase transitions , 2020, NeurIPS.

[63]  Daniel M. Kane,et al.  Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks , 2020, COLT.

[64]  David Gamarnik,et al.  High Dimensional Linear Regression using Lattice Basis Reduction , 2018, NeurIPS.

[65]  Stanislaw J. Szarek,et al.  Condition numbers of random matrices , 1991, J. Complex..

[66]  David Gamarnik,et al.  Inference in High-Dimensional Linear Regression via Lattice Basis Reduction and Integer Relation Detection , 2019, IEEE Transactions on Information Theory.

[67]  Sakinah,et al.  Vol. , 2020, New Medit.

[68]  David Gamarnik,et al.  The Landscape of the Planted Clique Problem: Dense subgraphs and the Overlap Gap Property , 2019, ArXiv.

[69]  Carl Landwher,et al.  2018 , 2019, Communications of the ACM.

[70]  L. Ronkin Liouville's theorems for functions holomorphic on the zero set of a polynomial , 1979 .

[71]  Wein. , 2021, Lebensmittel Zeitung.

[72]  K. N. Dollman,et al.  - 1 , 1743 .

[73]  Damien Stehlé,et al.  CRYSTALS - Dilithium: Digital Signatures from Module Lattices , 2017, IACR Cryptol. ePrint Arch..

[74]  Jeffrey C. Lagarias,et al.  Solving low density subset sum problems , 1983, 24th Annual Symposium on Foundations of Computer Science (sfcs 1983).

[75]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[76]  Shai Shalev-Shwartz,et al.  SGD Learns Over-parameterized Networks that Provably Generalize on Linearly Separable Data , 2017, ICLR.

[77]  Florent Krzakala,et al.  Marvels and Pitfalls of the Langevin Algorithm in Noisy High-dimensional Inference , 2018, Physical Review X.

[78]  Shai Ben-David,et al.  Understanding Machine Learning: From Theory to Algorithms , 2014 .

[79]  Pravesh Kothari,et al.  A Nearly Tight Sum-of-Squares Lower Bound for the Planted Clique Problem , 2016, 2016 IEEE 57th Annual Symposium on Foundations of Computer Science (FOCS).

[80]  Ray A. Perlner,et al.  Status report on the second round of the NIST post-quantum cryptography standardization process , 2020 .

[81]  Tom Goldstein,et al.  PhaseMax: Convex Phase Retrieval via Basis Pursuit , 2016, IEEE Transactions on Information Theory.

[82]  Emmanuel Abbe,et al.  Poly-time universality and limitations of deep learning , 2020, ArXiv.

[83]  Kristie B. Hadden,et al.  2020 , 2020, Journal of Surgical Orthopaedic Advances.

[84]  Alexander A. Sherstov,et al.  Cryptographic Hardness for Learning Intersections of Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[85]  Jeffrey C. Lagarias,et al.  Knapsack Public Key Cryptosystems and Diophantine Approximation , 1983, CRYPTO.

[86]  Divesh Aggarwal,et al.  Slide Reduction, Revisited - Filling the Gaps in SVP Approximation , 2019, CRYPTO.

[87]  A. Carbery,et al.  Distributional and L-q norm inequalities for polynomials over convex bodies in R-n , 2001 .

[88]  Leslie G. Valiant,et al.  Cryptographic Limitations on Learning Boolean Formulae and Finite Automata , 1993, Machine Learning: From Theory to Applications.

[89]  Raghu Meka,et al.  Anti-concentration for Polynomials of Independent Random Variables , 2016, Theory Comput..

[90]  Nicolas Macris,et al.  The committee machine: computational to statistical gaps in learning a two-layers neural network , 2018, NeurIPS.

[91]  Claus-Peter Schnorr,et al.  Lattice Basis Reduction: Improved Practical Algorithms and Solving Subset Sum Problems , 1991, FCT.

[92]  O. Papaspiliopoulos High-Dimensional Probability: An Introduction with Applications in Data Science , 2020 .

[93]  Galen Reeves,et al.  All-or-Nothing Phenomena: From Single-Letter to High Dimensions , 2019, 2019 IEEE 8th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).