Time-space lower bounds for two-pass learning

A line of recent works showed that for a large class of learning problems, any learning algorithm requires either super-linear memory size or a super-polynomial number of samples [11, 7, 12, 9, 2, 5]. For example, any algorithm for learning parities of size n requires either a memory of size Ω(n2) or an exponential number of samples [11]. All these works modeled the learner as a one-pass branching program, allowing only one pass over the stream of samples. In this work, we prove the first memory-samples lower bounds (with a super-linear lower bound on the memory size and super-polynomial lower bound on the number of samples) when the learner is allowed two passes over the stream of samples. For example, we prove that any two-pass algorithm for learning parities of size n requires either a memory of size Ω(n1.5) or at least 2Ω([EQUATION]) samples. More generally, a matrix M : A x X → {-1, 1} corresponds to the following learning problem: An unknown element x ∈ X is chosen uniformly at random. A learner tries to learn x from a stream of samples, (a1, b1), (a2, b2) ..., where for every i, ai ∈ A is chosen uniformly at random and bi = M(ai, x). Assume that k, l, r are such that any submatrix of M of at least 2-k · |A| rows and at least 2-l · |X| columns, has a bias of at most 2-r. We show that any two-pass learning algorithm for the learning problem corresponding to M requires either a memory of size at least Ω(k · min[EQUATION]), or at least 2Ω(min[EQUATION]) samples.

[1]  Miklos Santha,et al.  Generating Quasi-Random Sequences from Slightly-Random Sources (Extended Abstract) , 1984, FOCS.

[2]  Ran Raz,et al.  Extractor-based time-space lower bounds for learning , 2017, Electron. Colloquium Comput. Complex..

[3]  Ran Raz,et al.  Interactive channel capacity , 2013, STOC '13.

[4]  Ran Raz,et al.  Time-space hardness of learning sparse parities , 2017, Electron. Colloquium Comput. Complex..

[5]  Dana Moshkovitz,et al.  Entropy Samplers and Strong Generic Lower Bounds For Space Bounded Learning , 2018, ITCS.

[6]  Ohad Shamir,et al.  Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation , 2013, NIPS.

[7]  Oded Goldreich,et al.  Unbiased Bits from Sources of Weak Randomness and Probabilistic Communication Complexity , 1988, SIAM J. Comput..

[8]  O. Shamir,et al.  L G ] 6 J un 2 01 8 Detecting Correlations with Little Memory and Communication , 2018 .

[9]  Gregory Valiant,et al.  Information Theoretically Secure Databases , 2016, Electron. Colloquium Comput. Complex..

[10]  Ran Raz Fast Learning Requires Good Memory : A Time-Space Lower Bound for Parity Learning , 2018 .

[11]  David A. Mix Barrington,et al.  Bounded-width polynomial-size branching programs recognize exactly those languages in NC1 , 1986, STOC '86.

[12]  Ran Raz,et al.  A Time-Space Lower Bound for a Large Class of Learning Problems , 2017, 2017 IEEE 58th Annual Symposium on Foundations of Computer Science (FOCS).

[13]  Gregory Valiant,et al.  Memory, Communication, and Statistical Queries , 2016, COLT.

[14]  Dana Moshkovitz,et al.  Mixing Implies Lower Bounds for Space Bounded Learning , 2017, COLT.

[15]  Naftali Tishby,et al.  Mixing Complexity and its Applications to Neural Networks , 2017, ArXiv.

[16]  Xin Yang,et al.  Time-Space Tradeoffs for Learning Finite Functions from Random Evaluations, with Applications to Polynomials , 2018, Electron. Colloquium Comput. Complex..