A brief history of parameterized matching problems

Abstract Parameterized pattern matching is a string searching variant that was initially defined to detect duplicate code but later proved to support several other applications. In particular, two equal-length strings X and Y are a parameterized-match if there exists a bijective function g for which every text symbol in X is equal to g ( Y ) . Baker was the first researcher to have addressed this problem (Baker, 1993) and, since then, many others have followed her work. She did, indeed, open up a wide field of extensive research. Over the years, many variants and extensions that have been pursued include: parameterized matching under edit and Hamming distances, parameterized multi-pattern matching, two dimensional parameterized matching, structural matching, function matching, and the very recent developments in succinct and streaming models. This accelerated research could only be justified by the usefulness of its practical applications such as in software maintenance, image processing and bioinformatics to name some. Even though the problem was posed about 25 years ago, research on parameterized matching is still very active. Its extensive study over the years and its current relevance motivate us to review the most notorious contributions as road map for current and future research.

[1]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[2]  Roberto Grossi,et al.  Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching , 2005, SIAM J. Comput..

[3]  Maxime Crochemore,et al.  Occurrence and Substring Heuristics for i-Matching , 2003, Fundam. Informaticae.

[4]  Moshe Lewenstein,et al.  Approximate parameterized matching , 2007, TALG.

[5]  Batey,et al.  Tertiary Motifs in RNA Structure and Folding. , 1999, Angewandte Chemie.

[6]  S. Rao Kosaraju Faster Algorithms for the Construction of Parameterized Suffix Trees (Preliminary Version) , 1995, FOCS.

[7]  Julien Allali,et al.  A New Distance for High Level RNA Secondary Structure Comparison , 2005, TCBB.

[8]  Yoan J. Pinzón,et al.  Approximate Function Matching under δ- and γ- Distances , 2012, SPIRE.

[9]  Moshe Lewenstein,et al.  Parameterized matching with mismatches , 2007, J. Discrete Algorithms.

[10]  Markus Jalsenius,et al.  Parameterized Matching in the Streaming Model , 2013, STACS.

[11]  Richard A. Beal Parameterized Strings: Algorithms and Data Structures , 2011 .

[12]  Raffaele Giancarlo,et al.  Periodicity and repetitions in parameterized strings , 2008, Discret. Appl. Math..

[13]  David Haussler,et al.  The Smallest Automaton Recognizing the Subwords of a Text , 1985, Theor. Comput. Sci..

[14]  Alberto Apostolico,et al.  Parameterized searching with mismatches for run-length encoded strings , 2010, Theor. Comput. Sci..

[15]  Amar Mukherjee,et al.  The Burrows-Wheeler Transform:: Data Compression, Suffix Arrays, and Pattern Matching , 2008 .

[16]  Joong Chae Na,et al.  On-Line Construction of Parameterized Suffix Trees , 2009, SPIRE.

[17]  Juan Mendivelso DEFINITION AND SOLUTION OF A NEW STRING SEARCHING VARIANT TERMED -PARAMETERIZED MATCHING , 2010 .

[18]  Moshe Lewenstein,et al.  Two-Dimensional Parameterized Matching , 2014, ACM Trans. Algorithms.

[19]  Hideo Bannai,et al.  Counting Parameterized Border Arrays for a Binary Alphabet , 2009, LATA.

[20]  Yoan J. Pinzón,et al.  delta-gamma-Parameterized Matching , 2008, SPIRE.

[21]  Hideo Bannai,et al.  Parameterized Suffix Arrays for Binary Strings , 2008, Stringology.

[22]  Suneeta Agarwal,et al.  Parameterized string matching: an application to software maintenance , 2010, SOEN.

[23]  Wojciech Plandowski,et al.  Speeding up two string-matching algorithms , 2005, Algorithmica.

[24]  Juan Carlos,et al.  The Graph Pattern Matching Problem through Parameterized Matching , 2015 .

[25]  Payal Gupta,et al.  Literature Survey of Clone Detection Techniques , 2014 .

[26]  Joong Chae Na,et al.  On-line construction of parameterized suffix trees for large alphabets , 2011, Inf. Process. Lett..

[27]  Chanchal Kumar Roy,et al.  Comparison and evaluation of code clone detection techniques and tools: A qualitative approach , 2009, Sci. Comput. Program..

[28]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[29]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[30]  Hideo Bannai,et al.  Lightweight Parameterized Suffix Array Construction , 2009, IWOCA.

[31]  Giancarlo Fortino,et al.  Discovery of Hidden Correlations between Heterogeneous Wireless Sensor Data Streams , 2014, IDCS.

[32]  E. Bishop Conditions for the analyticity of certian sets. , 1964 .

[33]  Giorgio Terracina,et al.  Frequency-based similarity for parameterized sequences: Formal framework, algorithms, and applications , 2013, Inf. Sci..

[34]  Brenda S. Baker Parameterized diff , 1999, SODA '99.

[35]  Tetsuo Shibuya Generalization of a Suffix Tree for RNA Structural Pattern Matching , 2003, Algorithmica.

[36]  Gonzalo Navarro,et al.  Parameterized matching on non-linear structures , 2009, Inf. Process. Lett..

[37]  Kalpesh Kapoor,et al.  Weighted approximate parameterized string matching , 2017, AKCE Int. J. Graphs Comb..

[38]  S. Muthukrishnan,et al.  Alphabet Dependence in Parameterized Matching , 1994, Inf. Process. Lett..

[39]  H. Wilf,et al.  Uniqueness theorems for periodic functions , 1965 .

[40]  Donald A. Adjeroh,et al.  A prefix array for parameterized strings , 2017, J. Discrete Algorithms.

[41]  Chanchal Kumar Roy,et al.  Scenario-Based Comparison of Clone Detection Techniques , 2008, 2008 16th IEEE International Conference on Program Comprehension.

[42]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[43]  Richard Cole,et al.  Faster Suffix Tree Construction with Missing Suffix Links , 2003, SIAM J. Comput..

[44]  Donald A. Adjeroh,et al.  Border Array for Structural Strings , 2012, IWOCA.

[45]  Maxime Crochemore,et al.  Computing Longest Previous non-overlapping Factors , 2011, Inf. Process. Lett..

[46]  Maxime Crochemore,et al.  Algorithms For Computing Approximate Repetitions In Musical Sequences , 2002, Int. J. Comput. Math..

[47]  Rajesh Prasad,et al.  Fast parameterized word matching on compressed text , 2014, 2014 International Conference on Computer and Communication Technology (ICCCT).

[48]  Giorgio Terracina,et al.  Improving QuickBundles to Extract Anatomically Coherent White Matter Fiber-Bundles , 2016, ICIAR.

[49]  Peter J. Denning,et al.  Educating a new engineer , 1992, CACM.

[50]  Donald A. Adjeroh,et al.  Compressed parameterized pattern matching , 2016, Theor. Comput. Sci..

[51]  Rajesh Prasad,et al.  An efficient approach towards compressed parameterized word matching using wavelet tree , 2016 .

[52]  Wing-Kai Hon,et al.  Space-Efficient Dictionaries for Parameterized and Order-Preserving Pattern Matching , 2016, CPM.

[53]  Donald A. Adjeroh,et al.  Efficient pattern matching for RNA secondary structures , 2015, Theor. Comput. Sci..

[54]  Suneeta Agarwal,et al.  Software maintenance by multi-patterns parameterized string matching with q-gram , 2010, SOEN.

[55]  Moshe Lewenstein,et al.  On the longest common parameterized subsequence , 2009, Theor. Comput. Sci..

[56]  Gonzalo Navarro,et al.  Bit-parallel (delta, gamma)-matching and suffix automata , 2005, J. Discrete Algorithms.

[57]  Donald A. Adjeroh,et al.  Parameterized longest previous factor , 2012, Theor. Comput. Sci..

[58]  Eugene W. Myers,et al.  Suffix arrays: a new method for on-line string searches , 1993, SODA '90.

[59]  Bin Ma,et al.  The Longest Common Subsequence Problem for Arc-Annotated Sequences , 2000, CPM.

[60]  Hideo Bannai,et al.  Counting and Verifying Maximal Palindromes , 2010, SPIRE.

[61]  Donald A. Adjeroh,et al.  Variations of the parameterized longest previous factor , 2012, J. Discrete Algorithms.

[62]  Rolf Niedermeier,et al.  Pattern Matching for Arc-Annotated Sequences , 2002, FSTTCS.

[63]  Gaston H. Gonnet,et al.  A new approach to text searching , 1992, CACM.

[64]  William F. Smyth,et al.  Computing Patterns in Strings , 2003 .

[65]  Abraham Lempel,et al.  A universal algorithm for sequential data compression , 1977, IEEE Trans. Inf. Theory.

[66]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[67]  Raphaël Clifford,et al.  Permuted function matching , 2010, Inf. Process. Lett..

[68]  Lucian Ilie,et al.  Computing Longest Previous Factor in linear time and applications , 2008, Inf. Process. Lett..

[69]  Rahul Shah,et al.  pBWT: Achieving Succinct Data Structures for Parameterized Pattern Matching and Related Problems , 2017, SODA.

[70]  Donald A. Adjeroh,et al.  The Forward Stem Matrix: An Efficient Data Structure for Finding Hairpins in RNA Secondary Structures , 2013, BCB.

[71]  Brenda S. Baker,et al.  Parameterized Duplication in Strings: Algorithms and an Application to Software Maintenance , 1997, SIAM J. Comput..

[72]  Cédric du Mouza,et al.  Parameterized pattern queries , 2007, Data Knowl. Eng..

[73]  Shibsankar Das On approximate parameterized string matching and related problems , 2016 .

[74]  Alejandro A. Schäffer,et al.  Multiple Matching of Parametrized Patterns , 1996, Theor. Comput. Sci..

[75]  Giovanni Manzini,et al.  Indexing compressed text , 2005, JACM.

[76]  Seung-won Hwang,et al.  Solving Graph Isomorphism Using Parameterized Matching , 2013, SPIRE.

[77]  Wojciech Rytter,et al.  Efficient algorithms for three variants of the LPF table , 2012, J. Discrete Algorithms.

[78]  Bin Ma,et al.  Computing Similarity between RNA Structures , 1999, CPM.

[79]  Donald A. Adjeroh,et al.  The structural border array , 2013, J. Discrete Algorithms.

[80]  R. Ravi,et al.  Computing Similarity between RNA Strings , 1996, CPM.

[81]  Michael Beckstette,et al.  Lightweight comparison of RNAs based on exact sequence–structure matches , 2009, German Conference on Bioinformatics.

[82]  Donald Adjeroh,et al.  p-Suffix sorting as arithmetic coding , 2012, J. Discrete Algorithms.

[83]  Wing-Kai Hon,et al.  A Framework for Dynamic Parameterized Dictionary Matching , 2016, SWAT.

[84]  Costas S. Iliopoulos,et al.  Algorithms for Computing the Longest Parameterized Common Subsequence , 2007, CPM.

[85]  Michael G. Main,et al.  Detecting leftmost maximal periodicities , 1989, Discret. Appl. Math..

[86]  Amihood Amir,et al.  Generalized function matching , 2007, J. Discrete Algorithms.

[87]  Hideo Bannai,et al.  Verifying and enumerating parameterized border arrays , 2011, Theor. Comput. Sci..

[88]  Kimmo Fredriksson,et al.  Efficient parameterized string matching , 2006, Inf. Process. Lett..

[89]  Giorgio Terracina,et al.  An automated string-based approach to extracting and characterizing White Matter fiber-bundles , 2016, Comput. Biol. Medicine.

[90]  Robert E. Tarjan,et al.  A data structure for dynamic trees , 1981, STOC '81.

[91]  Brenda S. Baker Parameterized pattern matching by Boyer-Moore-type algorithms , 1995, SODA '95.