Given a set of n d-dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson-correlation ρ (Hamming distance d · 1-ρ/2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constants ε, ρ >; 0 and d >;>; logn/ρ<sup>2</sup> , finds the correlated pair with high probability, and runs in time O(n 3ω/4 + ϵ) <; O(n<sup>1.8</sup>), where w <; 2.38 is the exponent of matrix multiplication. Provided that d is sufficiently large, this runtime can be further reduced. These are the first subquadratic-time algorithms for this problem for which ρ does not appear in the exponent of n, and improves upon O(n<sup>2-O</sup>(ρ)), given by Paturi et al. [15], Locality Sensitive Hashing (LSH) [11] and the Bucketing Codes approach [6]. Applications and extensions of this basic algorithm yield improved algorithms for several other problems: ApproximateClosest Pair: For any sufficiently small constant ϵ >; 0, given n vectors in R<sup>d</sup>, our algorithm returns a pair of vectors whose Euclidean distance differs from that of the closest pair by a factor of at most 1+ϵ, and runs in time O(n<sup>2-Θ(√ϵ)</sup>). The best previous algorithms (including LSH) have runtime O(n<sup>2-O(ϵ)</sup>). Learning Sparse Parity with Noise: Given samples from an instance of the learning parity with noise problem where each example has length n, the true parity set has size at most k <;<; n, and the noise rate is η, our algorithm identifies the set of k indices in time n ω+ϵ/3 <sup>k</sup> poly(1/1-2η) <; n<sup>0.8k</sup>poly(1/1-2η). This is the first algorithm with no depenJence on η in the exponent of n, aside from the trivial brute-force algorithm. Learning k-Juntas with Noise: Given uniformly random length n Boolean vectors, together with a label, which is some function of just k <;<; n of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al. [7] our result for learning k-parities implies an algorithm for this problem with runtime n ω+ϵ/3 <sup>k</sup> poly(1/1-2η) <; n<sup>0.8k</sup> poly(1/1-2η), 2 which improves on the previous best of >; n<sup>k</sup>(1-2/2k)poly( 1/1-2η ), from [8]. Learning k-Juntas without Noise:1 Our results for learning sparse parities with noise imply an algorithm for learning juntas without noise with runtime n ω+ϵ/4<sup>k</sup> poly(n) <; n<sup>0.6</sup> kpoly(n), which improves on the runtime n ω+1/ω poly(n) ≈ n<sup>0.7k</sup> poly(n) of Mossel n et al. [13].
[1]
Vitaly Feldman,et al.
New Results for Learning Noisy Parities and Halfspaces
,
2006,
2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[2]
Rasmus Pagh,et al.
Compressed matrix multiplication
,
2011,
ITCS '12.
[3]
Noga Alon,et al.
Approximating the cut-norm via Grothendieck's inequality
,
2004,
STOC '04.
[4]
Karsten A. Verbeurgt.
Learning DNF under the uniform distribution in quasi-polynomial time
,
1990,
COLT '90.
[5]
A. Ron,et al.
Strictly positive definite functions on spheres in Euclidean spaces
,
1994,
Math. Comput..
[6]
Santosh S. Vempala,et al.
On Noise-Tolerant Learning of Sparse Parities and Related Problems
,
2011,
ALT.
[7]
Alexandr Andoni,et al.
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions
,
2006,
2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).
[8]
Piotr Indyk,et al.
Similarity Search in High Dimensions via Hashing
,
1999,
VLDB.
[9]
Piotr Indyk,et al.
Approximate nearest neighbors: towards removing the curse of dimensionality
,
1998,
STOC '98.
[10]
Don Coppersmith,et al.
Rectangular Matrix Multiplication Revisited
,
1997,
J. Complex..
[11]
Russell Impagliazzo,et al.
How to recycle random bits
,
1989,
30th Annual Symposium on Foundations of Computer Science.
[12]
Moshe Dubiner,et al.
Bucketing Coding and Information Theory for the Statistical High-Dimensional Nearest-Neighbor Problem
,
2008,
IEEE Transactions on Information Theory.
[13]
Manuel Blum,et al.
Secure Human Identification Protocols
,
2001,
ASIACRYPT.
[14]
Sanguthevar Rajasekaran,et al.
The light bulb problem
,
1995,
COLT '89.
[15]
Vadim Lyubashevsky,et al.
The Parity Problem in the Presence of Noise, Decoding Random Linear Codes, and the Subset Sum Problem
,
2005,
APPROX-RANDOM.
[16]
Virginia Vassilevska Williams,et al.
Multiplying matrices faster than coppersmith-winograd
,
2012,
STOC '12.
[17]
Ryan O'Donnell,et al.
Learning functions of k relevant variables
,
2004,
J. Comput. Syst. Sci..
[18]
J. Dicapua.
Chebyshev Polynomials
,
2019,
Fibonacci and Lucas Numbers With Applications.
[19]
Yishay Mansour,et al.
Weakly learning DNF and characterizing statistical query learning using Fourier analysis
,
1994,
STOC '94.
[20]
Leslie G. Valiant,et al.
Functionality in neural nets
,
1988,
COLT '88.