A Fast x86 Implementation of Select

Rank and select are fundamental operations in succinct data structures, that is, data structures whose space consumption approaches the information-theoretic optimal. The performance of these primitives is central to the overall performance of succinct data structures. Traditionally, the select operation is the harder to implement efficiently, and most prior implementations of select on machine words use 50--80 machine instructions. (In contrast, rank on machine words can be implemented in only a handful of instructions on machines that support POPCOUNT.) However, recently Pandey et al. gave a new implementation of machine-word select that uses only four x86 machine instructions; two of which were introduced in Intel's Haswell CPUs. In this paper, we investigate the impact of this new implementation of machine-word select on the performance of general bit-vector-select. We first compare Pandey et al.'s machine-word select to the state-of-the-art implementations of Zhou et al. (which is not specific to Haswell) and Gog et al. (which uses some Haswell-specific instructions). We exhibit a speedup of 2X to 4X. We then study the impact of plugging Pandey et al.'s machine-word select into two state-of-the-art bit-vector-select implementations. Both Zhou et al.'s and Gog et al.'s select implementations perform a single machine-word select operation for each bit-vector select. We replaced the machine-word select with the new implementation and compared performance. Even though there is only a single machine- word select operation, we still obtained speedups of 20% to 68%. We found that the new select not only reduced the number of instructions required for each bit-vector select, but also improved CPU instruction cache performance and memory-access parallelism.

[1]  Gonzalo Navarro,et al.  Fast, Small, Simple Rank/Select on Bitmaps , 2012, SEA.

[2]  R. González,et al.  PRACTICAL IMPLEMENTATION OF RANK AND SELECT QUERIES , 2005 .

[3]  Michael A. Bender,et al.  A General-Purpose Counting Filter: Making Every Bit Count , 2017, SIGMOD Conference.

[4]  Giovanni Manzini,et al.  Opportunistic data structures with applications , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[5]  J. Shane Culpepper,et al.  Top-k Ranked Document Search in General Text Databases , 2010, ESA.

[6]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[7]  Dong Zhou,et al.  Space-Efficient, High-Performance Rank and Select Structures on Uncompressed Bit Sequences , 2013, SEA.

[8]  Sebastiano Vigna,et al.  Broadword Implementation of Rank/Select Queries , 2008, WEA.

[9]  Simon Gog,et al.  Optimized succinct data structures for massive data , 2014, Softw. Pract. Exp..

[10]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees and multisets , 2002, SODA '02.

[11]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[12]  Guy Jacobson,et al.  Space-efficient static trees and graphs , 1989, 30th Annual Symposium on Foundations of Computer Science.

[13]  Ruby B. Lee,et al.  Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions , 2006, IEEE 17th International Conference on Application-specific Systems, Architectures and Processors (ASAP'06).

[14]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[15]  Gonzalo Navarro,et al.  Spaces, Trees, and Colors , 2013, ACM Comput. Surv..

[16]  Peter Elias,et al.  Efficient Storage and Retrieval by Content and Address of Static Files , 1974, JACM.