LF Successor: Compact Space Indexing for Order-Isomorphic Pattern Matching

Two strings are order isomorphic iff the relative ordering of their characters is the same at all positions. For a given text T [1, n] over an ordered alphabet of size σ, we can maintain an order-isomorphic suffix tree/array in O(n log n) bits and support (order-isomorphic) pattern/substring matching queries efficiently. It is interesting to know if we can encode these structures in space close to the text’s size of n log σ bits. We answer this question positively by presenting an O(n log σ)-bit index that allows access to any entry in order-isomorphic suffix array (and its inverse array) in tSA = O(log2 n/ log σ) time. For any pattern P given as a query, this index can count the number of substrings of T that are order-isomorphic to P (denoted by occ) in O((|P | log σ + tSA) log n) time using standard techniques. Also, it can report the locations of those substrings in additional O(occ · tSA) time. 2012 ACM Subject Classification Theory of computation → Pattern matching

[1]  Kunihiko Sadakane,et al.  Fully Functional Static and Dynamic Succinct Trees , 2009, TALG.

[2]  Richard Cole,et al.  Faster suffix tree construction with missing suffix links , 2000, STOC '00.

[3]  Rahul Shah,et al.  Structural Pattern Matching - Succinctly , 2017, ISAAC.

[4]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[5]  Hwan-Gue Cho,et al.  Simpler FM-index for parameterized string matching , 2021, Inf. Process. Lett..

[6]  Rudolf Fleischer,et al.  Order Preserving Matching , 2013, Theor. Comput. Sci..

[7]  Domenico Cantone,et al.  The order-preserving pattern matching problem in practice , 2020, Discret. Appl. Math..

[8]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[9]  Travis Gagie,et al.  A Compact Index for Order-Preserving Pattern Matching , 2017, 2017 Data Compression Conference (DCC).

[10]  Sharma V. Thankachan,et al.  A brief history of parameterized matching problems , 2020, Discret. Appl. Math..

[11]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[12]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[13]  Wojciech Rytter,et al.  A linear time algorithm for consecutive permutation pattern matching , 2013, Inf. Process. Lett..

[14]  Gonzalo Navarro,et al.  Compact Data Structures - A Practical Approach , 2016 .

[15]  Wojciech Rytter,et al.  Order-preserving indexing , 2016, Theor. Comput. Sci..

[16]  Rahul Shah,et al.  pBWT: Achieving Succinct Data Structures for Parameterized Pattern Matching and Related Problems , 2017, SODA.

[17]  Meng He,et al.  Indexing Compressed Text , 2003 .

[18]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[19]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[20]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[21]  Travis Gagie,et al.  An Encoding for Order-Preserving Matching , 2016, ESA.

[22]  Wing-Kai Hon,et al.  A framework for designing space-efficient dictionaries for parameterized and order-preserving matching , 2020, Theor. Comput. Sci..

[23]  Hideo Bannai,et al.  Order Preserving Pattern Matching on Trees and DAGs , 2017, SPIRE.