The limits of buffering: a tight lower bound for dynamic membership in the external memory model

We study the dynamic membership (or dynamic dictionary) problem, which is one of the most fundamental problems in data structures. We study the problem in the external memory model with cell size b bits and cache size m bits. We prove that if the amortized cost of updates is at most 0.999 (or any other constant < 1), then the query cost must be (logb log n(n/m)), where n is the number of elements in the dictionary. In contrast, when the update time is allowed to be 1 + o(1), then a bit vector or hash table give query time O(1). Thus, this is a threshold phenomenon for data structures. This lower bound answers a folklore conjecture of the external memory community. Since almost any data structure task can solve membership, our lower bound implies a dichotomy between two alternatives: (i) make the amortized update time at least 1 (so the data structure does not buffer, and we lose one of the main potential advantages of the cache), or (ii) make the query time at least roughly logarithmic in n. Our result holds even when the updates and queries are chosen uniformly at random and there are no deletions; it holds for randomized data structures, holds when the universe size is O(n), and does not make any restrictive assumptions such as indivisibility. All of the lower bounds we prove hold regardless of the space consumption of the data structure, while the upper bounds only need linear space. The lower bound has some striking implications for external memory data structures. It shows that the query complexities of many problems such as 1D-range counting, predecessor, rank-select, and many others, are all the same in the regime where the amortized update time is less than 1, as long as the cell size is large enough (b = polylog(n) suffices). The proof of our lower bound is based on a new combinatorial lemma called the Lemma of Surprising Intersections (LOSI) which allows us to use a proof methodology where we first analyze the intersection structure of the positive queries by using encoding arguments, and then use statistical arguments to deduce properties of the intersection structure of all queries, even the negative ones. In most other data structure arguments that we know, it is difficult to argue anything about the negative queries. Therefore we believe that the LOSI and this proof methodology might find future uses for other problems.

[1]  Peter Bro Miltersen,et al.  On data structures and asymmetric communication complexity , 1994, STOC '95.

[2]  Michael A. Bender,et al.  Cache-oblivious priority queue and graph algorithm applications , 2002, STOC '02.

[3]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[4]  Emanuele Viola,et al.  Cell-probe lower bounds for succinct partial sums , 2010, SODA '10.

[5]  Kurt Mehlhorn,et al.  On the complexity of a game related to the dictionary problem , 1989, 30th Annual Symposium on Foundations of Computer Science.

[6]  Qin Zhang,et al.  On the cell probe complexity of dynamic membership , 2010, SODA '10.

[7]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[8]  Mikkel Thorup,et al.  Time-space trade-offs for predecessor search , 2006, STOC '06.

[9]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[10]  Jeffrey Scott Vitter,et al.  Algorithms and Data Structures for External Memory , 2008, Found. Trends Theor. Comput. Sci..

[11]  Lars Arge,et al.  The Buffer Tree: A Technique for Designing Batched External Data Structures , 2003, Algorithmica.

[12]  Srinivasan Venkatesh,et al.  Lower bounds for predecessor searching in the cell probe model , 2003, J. Comput. Syst. Sci..

[13]  Qin Zhang,et al.  Dynamic external hashing: the limit of buffering , 2008, SPAA '09.

[14]  Andrew Chi-Chih Yao,et al.  Should Tables Be Sorted? , 1981, JACM.

[15]  Rajamani Sundar A lower bound for the dictionary problem under a hashing model , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[16]  Gerth Stølting Brodal,et al.  Lower bounds for external memory dictionaries , 2003, SODA '03.

[17]  Witold Litwin,et al.  Linear Hashing: A new Algorithm for Files and Tables Addressing , 1980, ICOD.

[18]  Erik D. Demaine,et al.  Tight bounds for the partial-sums problem , 2004, SODA '04.

[19]  Rasmus Pagh,et al.  Optimality in External Memory Hashing , 2007, Algorithmica.

[20]  Friedhelm Meyer auf der Heide,et al.  Dynamic perfect hashing: upper and lower bounds , 1988, [Proceedings 1988] 29th Annual Symposium on Foundations of Computer Science.

[21]  Mihai Patrascu,et al.  Lower bound techniques for data structures , 2008 .

[22]  John Iacono,et al.  Using hashing to solve the dictionary problem , 2012, SODA.

[23]  Alexander Golynski,et al.  Cell probe lower bounds for succinct data structures , 2009, SODA.

[24]  Faith Ellen,et al.  Optimal Bounds for the Predecessor Problem and Related Problems , 2002, J. Comput. Syst. Sci..

[25]  Per-Åke Larson,et al.  Performance analysis of linear hashing with partial expansions , 1982, TODS.

[26]  Lars Arge,et al.  External Memory Data Structures , 2001, ESA.

[27]  Ronald Fagin,et al.  Extendible hashing—a fast access method for dynamic files , 1979, ACM Trans. Database Syst..

[28]  Mihai Patrascu,et al.  On dynamic range reporting in one dimension , 2005, STOC '05.

[29]  Alok Aggarwal,et al.  The input/output complexity of sorting and related problems , 1988, CACM.

[30]  J. Schiff Wiley‐Interscience Series in Discrete Mathematics and Optimization , 2011 .

[31]  N. Alon,et al.  The Probabilistic Method, Second Edition , 2000 .

[32]  Jukka Teuhola,et al.  Heaps and Heapsort on Secondary Storage , 1999, Theor. Comput. Sci..

[33]  Rasmus Pagh,et al.  Basic External Memory Data Structures , 2002, Algorithms for Memory Hierarchies.

[34]  Rasmus Pagh,et al.  On the cell probe complexity of membership and perfect hashing , 2001, STOC '01.