论文信息 - Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas

Finding Correlations in Subquadratic Time, with Applications to Learning Parities and Juntas

Given a set of n d-dimensional Boolean vectors with the promise that the vectors are chosen uniformly at random with the exception of two vectors that have Pearson-correlation ρ (Hamming distance d · 1-ρ/2), how quickly can one find the two correlated vectors? We present an algorithm which, for any constants ε, ρ >; 0 and d >;>; logn/ρ2 , finds the correlated pair with high probability, and runs in time O(n 3ω/4 + ϵ) <; O(n1.8), where w <; 2.38 is the exponent of matrix multiplication. Provided that d is sufficiently large, this runtime can be further reduced. These are the first subquadratic-time algorithms for this problem for which ρ does not appear in the exponent of n, and improves upon O(n2-O(ρ)), given by Paturi et al. [15], Locality Sensitive Hashing (LSH) [11] and the Bucketing Codes approach [6]. Applications and extensions of this basic algorithm yield improved algorithms for several other problems: ApproximateClosest Pair: For any sufficiently small constant ϵ >; 0, given n vectors in Rd, our algorithm returns a pair of vectors whose Euclidean distance differs from that of the closest pair by a factor of at most 1+ϵ, and runs in time O(n2-Θ(√ϵ)). The best previous algorithms (including LSH) have runtime O(n2-O(ϵ)). Learning Sparse Parity with Noise: Given samples from an instance of the learning parity with noise problem where each example has length n, the true parity set has size at most k <;<; n, and the noise rate is η, our algorithm identifies the set of k indices in time n ω+ϵ/3 k poly(1/1-2η) <; n0.8kpoly(1/1-2η). This is the first algorithm with no depenJence on η in the exponent of n, aside from the trivial brute-force algorithm. Learning k-Juntas with Noise: Given uniformly random length n Boolean vectors, together with a label, which is some function of just k <;<; n of the bits, perturbed by noise rate η, return the set of relevant indices. Leveraging the reduction of Feldman et al. [7] our result for learning k-parities implies an algorithm for this problem with runtime n ω+ϵ/3 k poly(1/1-2η) <; n0.8k poly(1/1-2η), 2 which improves on the previous best of >; nk(1-2/2k)poly( 1/1-2η ), from [8]. Learning k-Juntas without Noise:1 Our results for learning sparse parities with noise imply an algorithm for learning juntas without noise with runtime n ω+ϵ/4k poly(n) <; n0.6 kpoly(n), which improves on the runtime n ω+1/ω poly(n) ≈ n0.7k poly(n) of Mossel n et al. [13].

Gregory Valiant | G. Valiant

[1] Vitaly Feldman,et al. New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[2] Rasmus Pagh,et al. Compressed matrix multiplication , 2011, ITCS '12.

[3] Noga Alon,et al. Approximating the cut-norm via Grothendieck's inequality , 2004, STOC '04.

[4] Karsten A. Verbeurgt. Learning DNF under the uniform distribution in quasi-polynomial time , 1990, COLT '90.

[5] A. Ron,et al. Strictly positive definite functions on spheres in Euclidean spaces , 1994, Math. Comput..

[6] Santosh S. Vempala,et al. On Noise-Tolerant Learning of Sparse Parities and Related Problems , 2011, ALT.

[7] Alexandr Andoni,et al. Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8] Piotr Indyk,et al. Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[9] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[10] Don Coppersmith,et al. Rectangular Matrix Multiplication Revisited , 1997, J. Complex..

[11] Russell Impagliazzo,et al. How to recycle random bits , 1989, 30th Annual Symposium on Foundations of Computer Science.

[12] Moshe Dubiner,et al. Bucketing Coding and Information Theory for the Statistical High-Dimensional Nearest-Neighbor Problem , 2008, IEEE Transactions on Information Theory.

[13] Manuel Blum,et al. Secure Human Identification Protocols , 2001, ASIACRYPT.

[14] Sanguthevar Rajasekaran,et al. The light bulb problem , 1995, COLT '89.

[15] Vadim Lyubashevsky,et al. The Parity Problem in the Presence of Noise, Decoding Random Linear Codes, and the Subset Sum Problem , 2005, APPROX-RANDOM.

[16] Virginia Vassilevska Williams,et al. Multiplying matrices faster than coppersmith-winograd , 2012, STOC '12.

[17] Ryan O'Donnell,et al. Learning functions of k relevant variables , 2004, J. Comput. Syst. Sci..

[18] J. Dicapua. Chebyshev Polynomials , 2019, Fibonacci and Lucas Numbers With Applications.

[19] Yishay Mansour,et al. Weakly learning DNF and characterizing statistical query learning using Fourier analysis , 1994, STOC '94.

[20] Leslie G. Valiant,et al. Functionality in neural nets , 1988, COLT '88.