Tiny Pointers

This paper introduces a new data-structural object that we call the tiny pointer. In many applications, traditional logn-bit pointers can be replaced with o(logn)-bit tiny pointers at the cost of only a constantfactor time overhead. We develop a comprehensive theory of tiny pointers, and give optimal constructions for both fixed-size tiny pointers (i.e., settings in which all of the tiny pointers must be the same size) and variable-size tiny pointers (i.e., settings in which the average tiny-pointer size must be small, but some tiny pointers can be larger). If a tiny pointer references an element in an array filled to load factor 1−1/k, then the optimal tiny-pointer size is Θ(log log logn+log k) bits in the fixed-size case, and Θ(log k) expected bits in the variable-size case. Our tiny-pointer constructions also require us to revisit several classic problems having to do with balls and bins; these results may be of independent interest. Using tiny pointers, we revisit five classic data-structure problems. We show that: • A data structure storing n v-bit values for n keys with constant-time modifications/queries can be implemented to take space nv+O(n log n) bits, for any constant r > 0, as long as the user stores a tiny pointer of expected size O(1) with each key—here, log n is the r-th iterated logarithm. • Any binary search tree can be made succinct with constant-factor time overhead, and can even be made to be within O(n) bits of optimal if we allow for O(log∗ n)-time modifications—this holds even for rotation-based trees such as the splay tree and the red-black tree. • Any fixed-capacity key-value dictionary can be made stable (i.e., items do not move once inserted) with constant-time overhead and 1 + o(1) space overhead. • Any key-value dictionary that requires uniform-size values can be made to support arbitrary-size values with constant-time overhead and with an additional space consumption of log n+O(log j) bits per j-bit value for an arbitrary constant r > 0 of our choice. • Given an external-memory array A of size (1 + ε)n containing a dynamic set of up to n key-value pairs, it is possible to maintain an internal-memory stash of size O(n log ε−1) bits so that the location of any key-value pair in A can be computed in constant time (and with no IOs). These are all well studied and classic problems, and in each case tiny pointers allow for us to take a natural space-inefficient solution that uses pointers and make it space-efficient for free. ∗Stony Brook University. bender@cs.stonybrook.edu †VMware Research Group. aconway@vmware.com ‡Rutgers University. martin@farach-colton.com §Massachusetts Institute of Technology. kuszmaul@mit.edu ¶Rutgers University. guido.tag@rutgers.edu ar X iv :2 11 1. 12 80 0v 1 [ cs .D S] 2 4 N ov 2 02 1

[1]  Kunihiko Sadakane,et al.  Fully Functional Static and Dynamic Succinct Trees , 2009, TALG.

[2]  Gonzalo Navarro,et al.  Simple and efficient fully-functional succinct trees , 2016, Theor. Comput. Sci..

[3]  Friedhelm Meyer auf der Heide,et al.  De Dictionariis Dynamicis Pauco Spatio Utentibus , 2005, ArXiv.

[4]  Per-Åke Larson,et al.  Linear hashing with separators—a dynamic hashing scheme achieving one-access , 1988, TODS.

[5]  Martin Dietzfelbinger,et al.  Almost random graphs with simple hash functions , 2003, STOC '03.

[6]  Gaston H. Gonnet,et al.  External hashing with limited internal storage , 1988 .

[7]  Moni Naor,et al.  Backyard Cuckoo Hashing: Constant Worst-Case Operations with a Succinct Representation , 2009, 2010 IEEE 51st Annual Symposium on Foundations of Computer Science.

[8]  Guy Even,et al.  A Space-Efficient Dynamic Dictionary for Multisets with Constant Time Operations , 2020, ArXiv.

[9]  Berthold Vöcking,et al.  How asymmetry helps load balancing , 1999, JACM.

[10]  Donald E. Knuth,et al.  The Art of Computer Programming: Volume 3: Sorting and Searching , 1998 .

[11]  Moni Naor,et al.  De-amortized Cuckoo Hashing: Provable Worst-Case Performance and Experimental Results , 2009, ICALP.

[12]  Guy Even,et al.  Fully-Dynamic Space-Efficient Dictionaries and Filters with Constant Number of Memory Accesses , 2019, ArXiv.

[13]  Venkatesh Raman,et al.  Representing dynamic binary trees succinctly , 2001, SODA '01.

[14]  Michael A. Bender,et al.  Paging and the Address-Translation Problem , 2021, SPAA.

[15]  Paul G. Spirakis,et al.  Space Efficient Hash Tables with Worst Case Constant Access Time , 2003, Theory of Computing Systems.

[16]  Roberto Grossi,et al.  Optimal Worst-Case Operations for Implicit Cache-Oblivious Search Trees , 2003, WADS.

[17]  Martin Dietzfelbinger,et al.  Succinct Data Structures for Retrieval and Approximate Membership , 2008, ICALP.

[18]  Philipp Woelfel,et al.  Asymmetric balanced allocation with simple hash functions , 2006, SODA '06.

[19]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[20]  B. Reed Graph Colouring and the Probabilistic Method , 2001 .

[21]  Per-Åke Larson,et al.  File organization: implementation of a method guaranteeing retrieval in one access , 1984, CACM.

[22]  Peter Sanders Hashing with Linear Probing and Referential Integrity , 2018, ArXiv.

[23]  Michael A. Bender,et al.  On the optimal time/space tradeoff for hash tables , 2021, STOC.

[24]  Martin Dietzfelbinger,et al.  Balanced allocation and dictionaries with tightly packed constant size bins , 2005, Theor. Comput. Sci..

[25]  Rajeev Raman,et al.  Succinct Dynamic Dictionaries and Trees , 2003, ICALP.

[26]  Huacheng Yu,et al.  Succinct Filters for Sets of Unknown Sizes , 2020, ICALP.

[27]  Shikha Singh,et al.  Bloom Filters, Adaptivity, and the Dictionary Problem , 2018, 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS).

[28]  Cecilia R. Aragon,et al.  Randomized search trees , 2005, Algorithmica.

[29]  J. Ian Munro,et al.  Succinct representation of dynamic trees , 2011, Theor. Comput. Sci..

[30]  W. W. Peterson,et al.  Addressing for Random-Access Storage , 1957, IBM J. Res. Dev..

[31]  Stephen Alstrup,et al.  Optimal static range reporting in one dimension , 2001, STOC '01.

[32]  Stefan Walzer,et al.  Constant-Time Retrieval with O(log m) Extra Bits , 2019, STACS.

[33]  M. AdelsonVelskii,et al.  AN ALGORITHM FOR THE ORGANIZATION OF INFORMATION , 1963 .

[34]  Robert E. Tarjan,et al.  Self-adjusting binary search trees , 1985, JACM.

[35]  Rajeev Raman,et al.  On Succinct Representations of Binary Trees , 2017, Math. Comput. Sci..

[36]  Leonidas J. Guibas,et al.  A dichromatic framework for balanced trees , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[37]  Per-Åke Larson Analysis of Uniform Hashing , 1983, JACM.

[38]  Anna Pagh,et al.  Uniform Hashing in Constant Time and Optimal Space , 2008, SIAM J. Comput..