On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a "jumbled string") in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least one such match exists. So, for example for the alphabet Σ = {a,b,c}, the string s = abaccbabaaa has Parikh vector p(s) = (6,3,2), and the Parikh vector q = (2,1,1) appears once in s in position (1,4). Like its more precise counterpart, the renown Exact String Matching, Jumbled Pattern Matching has ubiquitous applications, e.g., string matching with a dyslectic word processor, table rearrangements, anagram checking, Scrabble playing and, allegedly, also analysis of mass spectrometry data. We consider two simple algorithms for Jumbled Pattern Matching and use very complicated data structures and analytic tools to show that they are not worse than the most obvious algorithm. We also show that we can achieve non-trivial efficient average case behavior, but that's less fun to describe in this abstract so we defer the details to the main part of the article, to be read at the reader's risk... well, at the reader's discretion.

[1]  Fabrizio Grandoni,et al.  Resilient dictionaries , 2009, TALG.

[2]  Zsuzsanna Lipták,et al.  Searching for Jumbled Patterns in Strings , 2009, Stringology.

[3]  Joseph S. Pliskin,et al.  A Stochastic Allocation Problem , 1980, Oper. Res..

[4]  Gary Benson Composition Alignment , 2003, WABI.

[5]  Gad M. Landau,et al.  Permutation Pattern Discovery in Biosequences , 2004, J. Comput. Biol..

[6]  Zsuzsanna Lipták,et al.  A Fast and Simple Algorithm for the Money Changing Problem , 2007, Algorithmica.

[7]  László Babai,et al.  Computing rank-convolutions with a mask , 2009, TALG.

[8]  Dennis Saleh Zs , 2001 .

[9]  David Richard Clark,et al.  Compact pat trees , 1998 .

[10]  Roberto Grossi,et al.  High-order entropy-compressed text indexes , 2003, SODA '03.

[11]  Giorgio Satta,et al.  Efficient text fingerprinting via Parikh mapping , 2003, J. Discrete Algorithms.

[12]  R. Bellman,et al.  Mathematical Programming and the Maximum Transform , 1962 .

[13]  D. Eppstein Efficient algorithms for sequence analysis with concave and convex gap costs , 1989 .

[14]  Gonzalo Navarro,et al.  Compressed full-text indexes , 2007, CSUR.

[15]  Sebastian Böcker,et al.  Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry , 2007, Bioinform..

[16]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[17]  Thomas Erlebach,et al.  Algorithmic complexity of protein identification: combinatorics of weighted strings , 2004, Discret. Appl. Math..

[18]  John L. Smith Tables , 1969, Neuromuscular Disorders.

[19]  Laxmi Parida Gapped Permutation Patterns for Comparative Genomics , 2006, WABI.

[20]  Yossi Azar,et al.  Algorithms - ESA 2006, 14th Annual European Symposium, Zurich, Switzerland, September 11-13, 2006, Proceedings , 2006, ESA.

[21]  Timothy M. Chan,et al.  Necklaces, Convolutions, and X+Y , 2006, Algorithmica.

[22]  Esko Ukkonen,et al.  A Comparison of Approximate String Matching Algorithms , 1996, Softw. Pract. Exp..

[23]  K. Goczyła The generalized Banach match-box problem: Application in disc storage management , 1986 .

[24]  Jon M. Kleinberg,et al.  Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis , 2003, NIPS.

[25]  Gad M. Landau,et al.  Scaled and permuted string matching , 2004, Inf. Process. Lett..

[26]  Joseph S. Pliskin,et al.  Optimal storage allocation for serial files , 1979, CACM.

[27]  Timothy M. Chan All-Pairs Shortest Paths with Real Weights in O(n3/log n) Time , 2005, WADS.