High-Entropy Dual Functions and Locally Decodable Codes (Extended Abstract)

Locally decodable codes (LDCs) allow any single encoded message symbol to be retrieved from a codeword with good probability by reading only a tiny number of codeword symbols, even if the codeword is partially corrupted. LDCs have surprisingly many applications in computer science and mathematics (we refer to [13, 10] for extensive surveys). But despite their ubiquity, they are poorly understood. Of particular interest is the tradeoff between the codeword length N as a function of message length k when the query complexity—the number of probed codeword symbols—and alphabet size are constant. The Hadamard code is a 2-query LDC of length N = 2O(k) and this length is optimal in the 2-query regime [11]. For q ≥ 3, near-exponential gaps persist between the best-known upper and lower bounds. The family of Reed-Muller codes, which generalize the Hadamard code, were for a long time the best-known examples, giving q-query LDCs of length exp(O(k1/(q−1))), until breakthrough constructions of matching vector LDCs of Yekhanin and Efremenko [12, 6]. In contrast with other combinatorial objects such as expander graphs, the probabilistic method has so far not been successfully used to beat the best explicit LDC constructions. In [3], a probabilistic framework was given that could in principle yield best-possible LDCs, albeit non-constructively. A special instance of this framework connects LDCs with a probabilistic version of Szemerédi’s theorem. The setup for this is as follows: For a finite abelian group G of size N = |G|, let D ⊆ G be a random subset where each element is present with probability ρ independently of all others. For k ≥ 3 and ε ∈ (0, 1), let E be the event that every subset A ⊆ G of size |A| ≥ ε|G| contains a proper k-term arithmetic progression with common difference in D. For fixed ε > 0 and sufficiently large N , it is an open problem to determine the smallest value of ρ — denoted ρk — such that Pr[E] ≥ 1 2 . In [3] it is shown that there exist k-query LDCs of message length Ω(ρkN) and codeword length O(N). As such, Szemerédi’s theorem with random differences, in particular lower bounds on ρk, can be used to show the existence of LDCs. Conversely, this connection indirectly implies the best-known upper bounds on ρk for all k ≥ 3 [8, 4]. However, a conjecture from [9] states that over ZN we have ρk ≤ Ok(N logN) for all k, which would be best-possible. Truth of this conjecture would imply that over this group, Szemerédi’s theorem with random differences cannot give LDCs better than the Hadamard code. For finite fields, Altman [1] showed that this is false. In particular, over Fp for p odd, he proved that ρ3 ≥ Ω(p−n n2); generally, ρk ≥ Ω(p−n nk−1) holds when p ≥ k + 1 [2]. In turn, these bounds are conjectured to be optimal for the finite-field setting, which would imply that over finite fields, Szemerédi’s theorem with random differences cannot give LDCs better than Reed-Muller codes. The finite-field conjecture is motivated mainly by the possibility that so-called dual functions can be approximated well by polynomial phases, functions of the form e2πiP (x)/p where P is a multivariate polynomial over Fp. We show that this is false. Using Yekhanin’s matching-vector-code construction, we give dual functions of order k over Fp that cannot be approximated in L∞-distance by polynomial phases of degree k − 1. This answers in the negative a natural finite-field analog of a problem of Frantzikinakis over N [7, Problem 1]. 2012 ACM Subject Classification Theory of computation