Succinct Indexes for Circular Patterns

Circular patterns are those patterns whose circular permutations are also valid patterns. These patterns arise naturally in bioinformatics and computational geometry. In this paper, we consider succinct indexing schemes for a set of d circular patterns of total length n, with each character drawn from an alphabet of size σ. Our method is by defining the popular Burrows-Wheeler transform (BWT) on circular patterns, based on which we achieve succinct indexes with space nlogσ(1+o(1))+O(n)+O(dlogn) bits, while pattern matching or dictionary matching queries can be supported efficiently.

[1]  Z. Galil,et al.  Pattern matching algorithms , 1997 .

[2]  J. Ian Munro,et al.  Succinct Representation of Balanced Parentheses and Static Trees , 2002, SIAM J. Comput..

[3]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[4]  Costas S. Iliopoulos,et al.  Indexing Circular Patterns , 2008, WALCOM.

[5]  Paola Briata,et al.  The RNA-Binding Protein KSRP Promotes Decay of β-Catenin mRNA and Is Inactivated by PI3K-AKT Signaling , 2006, PLoS biology.

[6]  Djamal Belazzougui Succinct Dictionary Matching with No Slowdown , 2010, CPM.

[7]  Wing-Kai Hon,et al.  Compression, Indexing, and Retrieval for Massive String Data , 2010, CPM.

[8]  Wing-Kai Hon,et al.  Faster Compressed Dictionary Matching , 2010, SPIRE.

[9]  R. Daniel,et al.  Metagenomic Analyses: Past and Future Trends , 2010, Applied and Environmental Microbiology.

[10]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[11]  Jonathan A Eisen,et al.  Environmental Shotgun Sequencing: Its Potential and Challenges for Studying the Hidden World of Microbes , 2007, PLoS biology.

[12]  Rajeev Raman,et al.  Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets , 2007, ACM Trans. Algorithms.

[13]  Wing-Kai Hon,et al.  Compressed indexes for dynamic text collections , 2007, TALG.

[14]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[15]  D. J. Wheeler,et al.  A Block-sorting Lossless Data Compression Algorithm , 1994 .

[16]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[17]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[18]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[19]  Nigel D. Stow,et al.  Circularization of the Herpes Simplex Virus Type 1 Genome upon Lytic Infection , 2005, Journal of Virology.

[20]  Michael A. Bender,et al.  The Level Ancestor Problem Simplified , 2002, LATIN.

[21]  Paolo Ferragina,et al.  The compressed permuterm index , 2010, TALG.

[22]  Wojciech Rytter,et al.  Extracting Powers and Periods in a String from Its Runs Structure , 2010, SPIRE.

[23]  Wing-Kai Hon,et al.  Compressed Index for Dictionary Matching , 2008, Data Compression Conference (dcc 2008).

[24]  Kunihiko Sadakane,et al.  Compressed Suffix Trees with Full Functionality , 2007, Theory of Computing Systems.

[25]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.