Given a set of n d-dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson-correlation ρ (Hamming distance d · 1-ρ/2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constants ε, ρ >; 0 and d >;>; logn/ρ<sup>2</sup> , finds the correlated pair with high probability, and runs in time O(n 3ω/4 + ϵ) <; O(n<sup>1.8</sup>), where w <; 2.38 is the exponent of matrix multiplication. Provided that d is sufficiently large, this runtime can be further reduced. These are the first subquadratic-time algorithms for this problem for which ρ does not appear in the exponent of n, and improves upon O(n<sup>2-O</sup>(ρ)), given by Paturi et al. [15], Locality Sensitive Hashing (LSH) [11] and the Bucketing Codes approach [6]. Applications and extensions of this basic algorithm yield improved algorithms for several other problems: ApproximateClosest Pair: For any sufficiently small constant ϵ >; 0, given n vectors in R<sup>d</sup>, our algorithm returns a pair of vectors whose Euclidean distance differs from that of the closest pair by a factor of at most 1+ϵ, and runs in time O(n<sup>2-Θ(√ϵ)</sup>). The best previous algorithms (including LSH) have runtime O(n<sup>2-O(ϵ)</sup>). Learning Sparse Parity with Noise: Given samples from an instance of the learning parity with noise problem where each example has length n, the true parity set has size at most k <;<; n, and the noise rate is η, our algorithm identifies the set of k indices in time n ω+ϵ/3 <sup>k</sup> poly(1/1-2η) <; n<sup>0.8k</sup>poly(1/1-2η). This is the first algorithm with no depenJence on η in the exponent of n, aside from the trivial brute-force algorithm. Learning k-Juntas with Noise: Given uniformly random length n Boolean vectors, together with a label, which is some function of just k <;<; n of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al. [7] our result for learning k-parities implies an algorithm for this problem with runtime n ω+ϵ/3 <sup>k</sup> poly(1/1-2η) <; n<sup>0.8k</sup> poly(1/1-2η), 2 which improves on the previous best of >; n<sup>k</sup>(1-2/2k)poly( 1/1-2η ), from [8]. Learning k-Juntas without Noise:1 Our results for learning sparse parities with noise imply an algorithm for learning juntas without noise with runtime n ω+ϵ/4<sup>k</sup> poly(n) <; n<sup>0.6</sup> kpoly(n), which improves on the runtime n ω+1/ω poly(n) ≈ n<sup>0.7k</sup> poly(n) of Mossel n et al. [13].
Vitaly Feldman,et al.
New Results for Learning Noisy Parities and Halfspaces
2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
Rasmus Pagh,et al.
Compressed matrix multiplication
ITCS '12.
Noga Alon,et al.
Approximating the cut-norm via Grothendieck's inequality
STOC '04.
Karsten A. Verbeurgt.
Learning DNF under the uniform distribution in quasi-polynomial time
COLT '90.
A. Ron,et al.
Strictly positive definite functions on spheres in Euclidean spaces
Math. Comput..
Santosh S. Vempala,et al.
On Noise-Tolerant Learning of Sparse Parities and Related Problems
Alexandr Andoni,et al.
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
Piotr Indyk,et al.
Similarity Search in High Dimensions via Hashing
Piotr Indyk,et al.
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98.
Don Coppersmith,et al.
Rectangular Matrix Multiplication Revisited
J. Complex..
Russell Impagliazzo,et al.
How to recycle random bits
30th Annual Symposium on Foundations of Computer Science.
Moshe Dubiner,et al.
Bucketing Coding and Information Theory for the Statistical High-Dimensional Nearest-Neighbor Problem
IEEE Transactions on Information Theory.
Manuel Blum,et al.
Secure Human Identification Protocols
Sanguthevar Rajasekaran,et al.
The light bulb problem
COLT '89.
Vadim Lyubashevsky,et al.
The Parity Problem in the Presence of Noise, Decoding Random Linear Codes, and the Subset Sum Problem
Virginia Vassilevska Williams,et al.
Multiplying matrices faster than coppersmith-winograd
STOC '12.
Ryan O'Donnell,et al.
Learning functions of k relevant variables
J. Comput. Syst. Sci..
J. Dicapua.
Chebyshev Polynomials
Fibonacci and Lucas Numbers With Applications.
Yishay Mansour,et al.
Weakly learning DNF and characterizing statistical query learning using Fourier analysis
STOC '94.
Leslie G. Valiant,et al.
Functionality in neural nets
COLT '88.