Streaming K-Mismatch with Error Correcting and Applications

We present a new streaming algorithm for the k-Mismatch problem, one of the most basic problems in pattern matching. Given a pattern and a text, the task is to find all substrings of the text that are at the Hamming distance at most k from the pattern. Our algorithm is enhanced with an important new feature called Error Correcting, and its complexities for k = 1 and for a general k match those of the best known solutions for the k-Mismatch problem from FOCS 2009 and SODA 2016. As a corollary we develop a series of streaming algorithms for pattern matching on weighted strings, which are a commonly used representation of uncertain sequences in molecular biology.

[1]  Xuhua Xia,et al.  Position Weight Matrix, Gibbs Sampler, and the Associated Significance Tests in Motif Characterization and Prediction , 2012, Scientifica.

[2]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[3]  Zvi Galil,et al.  Real-Time Streaming String-Matching , 2014, TALG.

[4]  Ely Porat,et al.  Space lower bounds for online pattern matching , 2013, Theor. Comput. Sci..

[5]  Tsvi Kopelowitz,et al.  Property matching and weighted matching , 2006, Theor. Comput. Sci..

[6]  Esko Ukkonen,et al.  Fast profile matching algorithms - A survey , 2008, Theor. Comput. Sci..

[7]  Costas S. Iliopoulos,et al.  The Weighted Suffix Tree: An Efficient Data Structure for Handling Molecular Weighted Sequences and its Applications , 2006, Fundam. Informaticae.

[8]  Ely Porat,et al.  Dictionary Matching in a Stream , 2015, ESA.

[9]  Sharma V. Thankachan,et al.  Probabilistic Threshold Indexing for Uncertain Strings , 2015, EDBT.

[10]  Solon P. Pissis,et al.  Efficient Index for Weighted Sequences , 2016, CPM.

[11]  Solon P. Pissis,et al.  Linear-Time Computation of Prefix Table for Weighted Strings , 2015, WORDS.

[12]  Funda Ergün,et al.  Periodicity in Streams , 2010, APPROX-RANDOM.

[13]  Costas S. Iliopoulos,et al.  Approximate Matching in Weighted Sequences , 2006, CPM.

[14]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[15]  Solon P. Pissis,et al.  Pattern Matching and Consensus Problems on Weighted Sequences and Profiles , 2016, Theory of Computing Systems.

[16]  Ely Porat,et al.  The k-mismatch problem revisited , 2016, SODA.

[17]  Ely Porat,et al.  Improved Sketching of Hamming Distance with Error Correcting , 2007, CPM.

[18]  Raphaël Clifford,et al.  Approximate Hamming Distance in a Stream , 2016, ICALP.

[19]  Costas S. Iliopoulos,et al.  Pattern Matching on Weighted Sequences , 2004 .

[20]  Maxime Crochemore,et al.  Algorithms on strings , 2007 .

[21]  Ely Porat,et al.  Exact and Approximate Pattern Matching in the Streaming Model , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.