How to answer a small batch of RMQs or LCA queries in practice

In the Range Minimum Query (RMQ) problem, we are given an array A of n numbers and we are asked to answer queries of the following type: for indices i and j between 0 and \(n-1\), query \(\text {RMQ}_A(i,j)\) returns the index of a minimum element in the subarray \(A[i\mathinner {.\,.}j]\). Answering a small batch of RMQs is a core computational task in many real-world applications, in particular due to the connection with the Lowest Common Ancestor (LCA) problem. With small batch, we mean that the number q of queries is o(n) and we have them all at hand. It is therefore not relevant to build an \(\varOmega (n)\)-sized data structure or spend \(\varOmega (n)\) time to build a more succinct one. It is well-known, among practitioners and elsewhere, that these data structures for online querying carry high constants in their pre-processing and querying time. We would thus like to answer this batch efficiently in practice. With efficiently in practice, we mean that we (ultimately) want to spend \(n + \mathcal {O}(q)\) time and \(\mathcal {O}(q)\) space. We write n to stress that the number of operations per entry of A should be a very small constant. Here we show how existing algorithms can be easily modified to satisfy these conditions. The presented experimental results highlight the practicality of this new scheme. The most significant improvement obtained is for answering a small batch of LCA queries. A library implementation of the presented algorithms is made available.

[1]  Mireille Régnier,et al.  New results on the size of tries , 1989, IEEE Trans. Inf. Theory.

[2]  Steven Skiena,et al.  Lowest common ancestors in trees and directed acyclic graphs , 2005, J. Algorithms.

[3]  Robert E. Tarjan,et al.  Scaling and related techniques for geometry problems , 1984, STOC '84.

[4]  Alistair Moffat,et al.  From Theory to Practice: Plug and Play with Succinct Data Structures , 2013, SEA.

[5]  Solon P. Pissis,et al.  MoTeX-II: structured MoTif eXtraction from large-scale datasets , 2014, BMC Bioinformatics.

[6]  James A. M. McHugh,et al.  A first approach to finding common motifs with gaps , 2004, Int. J. Found. Comput. Sci..

[7]  Nodari Sitchinava,et al.  I/O-Efficient Range Minima Queries , 2014, SWAT.

[8]  Ron Y. Pinter,et al.  Efficient String Matching with Don’t-Care Patterns , 1985 .

[9]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for a Special Case of Disjoint Set Union , 1985, J. Comput. Syst. Sci..

[10]  Uzi Vishkin,et al.  Recursive Star-Tree Parallel Data Structure , 1993, SIAM J. Comput..

[11]  Alexandru I. Tomescu,et al.  Genome-Scale Algorithm Design: Biological Sequence Analysis in the Era of High-Throughput Sequencing , 2015 .

[12]  Robert E. Tarjan,et al.  Fast Algorithms for Finding Nearest Common Ancestors , 1984, SIAM J. Comput..

[13]  Volker Heun,et al.  Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE , 2006, CPM.

[14]  Michael A. Bender,et al.  The LCA Problem Revisited , 2000, LATIN.

[15]  Solon P. Pissis,et al.  Pattern Matching and Consensus Problems on Weighted Sequences and Profiles , 2018, Theory of Computing Systems.

[16]  Gonzalo Navarro,et al.  Improved Range Minimum Queries , 2016, 2016 Data Compression Conference (DCC).

[17]  Costas S. Iliopoulos,et al.  Fast circular dictionary-matching algorithm , 2015, Mathematical Structures in Computer Science.

[18]  Naila Rahman,et al.  A simple optimal representation for balanced parentheses , 2006, Theor. Comput. Sci..

[19]  Peter Sanders,et al.  On (Dynamic) Range Minimum Queries in External Memory , 2013, WADS.

[20]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[21]  Lucian Ilie,et al.  The longest common extension problem revisited and applications to approximate string searching , 2010, J. Discrete Algorithms.