Matching of Compressed Patterns with Character-Variables

We consider the problem of finding an instance of a string-pattern s in a given string under compression by straight line programs (SLP). The variables of the string pattern can be instantiated by single characters. This is a generalisation of the fully compressed pattern match, which is the task of finding a compressed string in another compressed string, which is known to have a polynomial time algorithm. We mainly investigate patterns s that are linear in the variables, i.e. variables occur at most once in s, also known as partial words. We show that fully compressed pattern matching with linear patterns can be performed in polynomial time. A polynomial-sized representation of all matches and all substitutions is also computed. Also, a related algorithm is given that computes all periods of a compressed linear pattern in polynomial time. A technical key result on the structure of partial words shows that an overlap of h+2 copies of a partial word w with at most h holes implies that w is strongly periodic.

[1]  Markus Lohrey,et al.  Compressed Membership Problems for Regular Expressions and Hierarchical Automata , 2010, Int. J. Found. Comput. Sci..

[2]  Wojciech Plandowski,et al.  On the Complexity of Pattern Matching for Highly Compressed Two-Dimensional Texts , 2002, J. Comput. Syst. Sci..

[3]  Guillem Godoy,et al.  Unification and matching on compressed terms , 2011, TOCL.

[4]  Philippe Schnoebelen,et al.  A PTIME-complete matching problem for SLP-compressed words , 2004, Inf. Process. Lett..

[5]  Jordi Levy,et al.  The Complexity of Monadic Second-Order Unification , 2008, SIAM J. Comput..

[6]  Wojciech Rytter,et al.  An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions , 1997, Nord. J. Comput..

[7]  Paliath Narendran,et al.  Complexity of Matching Problems , 1987, J. Symb. Comput..

[8]  Philip Bille,et al.  Improved approximate string matching and regular expression matching on Ziv-Lempel compressed texts , 2009, TALG.

[9]  Manfred Schmidt-Schauß,et al.  Pattern Matching of Compressed Terms and Contexts and Polynomial Rewriting , 2011 .

[10]  Yury Lifshits,et al.  Processing Compressed Texts: A Tractability Border , 2007, CPM.

[11]  Gonzalo Navarro,et al.  Regular expression searching on compressed text , 2003, J. Discrete Algorithms.

[12]  Jean Berstel,et al.  Partial Words and a Theorem of Fine and Wilf , 1999, Theor. Comput. Sci..

[13]  Wojciech Plandowski,et al.  Complexity of Language Recognition Problems for Compressed Words , 1999, Jewels are Forever.

[14]  Abraham Lempel,et al.  Compression of individual sequences via variable-rate coding , 1978, IEEE Trans. Inf. Theory.

[15]  H. Wilf,et al.  Uniqueness theorems for periodic functions , 1965 .

[16]  Francine Blanchet-Sadri,et al.  Equations on partial words , 2006, RAIRO Theor. Informatics Appl..

[17]  Markus Lohrey,et al.  Word Problems and Membership Problems on Compressed Words , 2006, SIAM J. Comput..

[18]  F. Blanchet-Sadri Periodicity on Partial Words , 2004 .

[19]  Wojciech Plandowski,et al.  Testing Equivalence of Morphisms on Context-Free Languages , 1994, ESA.

[20]  Jordi Levy,et al.  Monadic Second-Order Unification Is NP-Complete , 2004, RTA.

[21]  Francine Blanchet-Sadri,et al.  Partial words and a theorem of Fine and Wilf revisited , 2002, Theor. Comput. Sci..

[22]  Hideo Bannai,et al.  Faster Subsequence and Don't-Care Pattern Matching on Compressed Texts , 2011, CPM.

[23]  Wojciech Rytter,et al.  Pattern-Matching for Strings with Short Descriptions , 1995, CPM.

[24]  Francine Blanchet-Sadri Algorithmic Combinatorics on Partial Words , 2012, Int. J. Found. Comput. Sci..

[25]  Sebastian Maneth,et al.  Efficient Memory Representation of XML Documents , 2005, DBPL.

[26]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.