Order-Preserving Pattern Matching Indeterminate Strings

Given an indeterminate string pattern p and an indeterminate string text t, the problem of order-preserving pattern matching with character uncertainties (muOPPM) is to find all substrings of t that satisfy one of the possible orderings defined by p. When the text and pattern are determinate strings, we are in the presence of the well-studied exact order-preserving pattern matching (OPPM) problem with diverse applications on time series analysis. Despite its relevance, the exact OPPM problem suffers from two major drawbacks: 1) the inability to deal with indetermination in the text, thus preventing the analysis of noisy time series; and 2) the inability to deal with indetermination in the pattern, thus imposing the strict satisfaction of the orders among all pattern positions. In this paper, we provide the first polynomial algorithms to answer the muOPPM problem when: 1) indetermination is observed on the pattern or text; and 2) indetermination is observed on both the pattern and the text and given by uncertainties between pairs of characters. First, given two strings with the same length m and O(r) uncertain characters per string position, we show that the muOPPM problem can be solved in O(mr lg r) time when one string is indeterminate and r in N^+ and in O(m^2) time when both strings are indeterminate and r=2. Second, given an indeterminate text string of length n, we show that muOPPM can be efficiently solved in polynomial time and linear space.

[1]  Jorma Tarhio,et al.  Improving practical exact string matching , 2010, Inf. Process. Lett..

[2]  Moshe Lewenstein,et al.  Overlap matching , 2001, SODA '01.

[3]  Piotr Indyk,et al.  Efficient computations of l1 and l∞ rearrangement distances , 2009, Theor. Comput. Sci..

[4]  Ely Porat,et al.  Approximate Matching in the L1 Metric , 2005, CPM.

[5]  Gad M. Landau,et al.  Pattern matching with swaps , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[6]  Domenico Cantone,et al.  An Efficient Algorithm for alpha-Approximate Matching with delta-Bounded Gaps in Musical Sequences , 2005, WEA.

[7]  Raphaël Clifford,et al.  Algorithms on Extended (δ,γ)-Matching , 2006 .

[8]  Shu Wang,et al.  Fast pattern-matching on indeterminate strings , 2008, J. Discrete Algorithms.

[9]  Domenico Cantone,et al.  An Efficient Algorithm for δ-Approximate Matching with α-Bounded Gaps in Musical Sequences , 2004 .

[10]  Rui Henriques,et al.  BicSPAM: flexible biclustering using sequential patterns , 2014, BMC Bioinformatics.

[11]  Yoan J. Pinzón,et al.  delta-gamma-Parameterized Matching , 2008, SPIRE.

[12]  Prosenjit Bose,et al.  Pattern Matching for Permutations , 1993, WADS.

[13]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[14]  Xian-ping Ge,et al.  Pattern Matching in Financial Time Series Data , 1998 .

[15]  Yoan J. Pinzón,et al.  Approximate Function Matching under δ- and γ- Distances , 2012, SPIRE.

[16]  Domenico Cantone,et al.  An Efficient Skip-Search Approach to the Order-Preserving Pattern Matching Problem , 2015, Stringology.

[17]  Jorma Tarhio,et al.  A filtration method for order-preserving matching , 2016, Inf. Process. Lett..

[18]  Cláudia Antunes,et al.  Methods for the Efficient Discovery of Large Item-Indexable Sequential Patterns , 2013, NFMCP.

[19]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[20]  Amihood Amir,et al.  Efficient 2-dimensional approximate matching of non-rectangular figures , 1991, SODA '91.

[21]  Amihood Amir,et al.  Generalized function matching , 2007, J. Discrete Algorithms.

[22]  Michael L. Fredman,et al.  On computing the length of longest increasing subsequences , 1975, Discret. Math..

[23]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[24]  Maxime Crochemore,et al.  Algorithms For Computing Approximate Repetitions In Musical Sequences , 2002, Int. J. Comput. Math..

[25]  Ely Porat,et al.  Approximate matching in the Linfinity metric , 2008, Inf. Process. Lett..

[26]  Ely Porat,et al.  Approximate Matching in the Linfinity Metric , 2005, SPIRE.

[27]  Jorma Tarhio,et al.  Engineering order‐preserving pattern matching with SIMD parallelism , 2017, Softw. Pract. Exp..

[28]  Szymon Grabowski,et al.  Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance , 2008, Information Retrieval.

[29]  Brenda S. Baker,et al.  A theory of parameterized pattern matching: algorithms and applications , 1993, STOC.

[30]  Joong Chae Na,et al.  Fast Order-Preserving Pattern Matching , 2013, COCOA.

[31]  Sylvain Guillemot,et al.  D S ] 5 N ov 2 01 5 Pattern matching in ( 213 , 231 )-avoiding permutations Both , 2009 .

[32]  Wojciech Rytter,et al.  A linear time algorithm for consecutive permutation pattern matching , 2013, Inf. Process. Lett..

[33]  Joong Chae Na,et al.  A fast algorithm for order-preserving pattern matching , 2015, Inf. Process. Lett..

[34]  Mathieu Raffinot,et al.  Single and Multiple Consecutive Permutation Motif Search , 2013, ISAAC.

[35]  Wojciech Plandowski,et al.  On special families of morphisms related to [delta]-matching and don't care symbols , 2003, Inf. Process. Lett..

[36]  Costas S. Iliopoulos,et al.  Approximate string matching for music analysis , 2004, Soft Comput..

[37]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[38]  Moshe Lewenstein,et al.  Function Matching , 2006, SIAM J. Comput..

[39]  Mikhail J. Atallah,et al.  Algorithms and Theory of Computation Handbook , 2009, Chapman & Hall/CRC Applied Algorithms and Data Structures series.

[40]  Rudolf Fleischer,et al.  Order Preserving Matching , 2013, Theor. Comput. Sci..

[41]  Domenico Cantone,et al.  ON TUNING THE (,)-SEQUENTIAL-SAMPLING ALGORITHM FOR -APPROXIMATE MATCHING WITH-BOUNDED GAPS IN MUSICAL SEQUENCES , 2005 .

[42]  Szymon Grabowski,et al.  Practical and Optimal String Matching , 2005, SPIRE.

[43]  Erik D. Demaine,et al.  Adaptive set intersections, unions, and differences , 2000, SODA '00.

[44]  Ana Paiva,et al.  Seven Principles to Mine Flexible Behavior from Physiological Signals for Effective Emotion Recognition and Description in Affective Interactions , 2014, PhyCS.

[45]  Ely Porat,et al.  Approximating general metric distances between a pattern and a text , 2008, SODA '08.

[46]  Wojciech Plandowski,et al.  Three Heuristics for delta-Matching: delta-BM Algorithms , 2002, CPM.

[47]  Jorma Tarhio,et al.  Alternative Algorithms for Order-Preserving Matching , 2015, Stringology.

[48]  Piotr Indyk,et al.  Efficient Computations of l 1 and l infinity Rearrangement Distances. , 2007 .

[49]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[50]  S. Muthukrishnan,et al.  New Results and Open Problems Related to Non-Standard Stringology , 1995, CPM.

[51]  Moshe Lewenstein Parameterized Matching , 2008, Encyclopedia of Algorithms.

[52]  S. Muthukrishnan,et al.  Alphabet Dependence in Parameterized Matching , 1994, Inf. Process. Lett..

[53]  Alberto Apostolico,et al.  General Pattern Matching , 2010, Algorithms and Theory of Computation Handbook.

[54]  Robert E. Tarjan,et al.  A Linear-Time Algorithm for Testing the Truth of Certain Quantified Boolean Formulas , 1979, Inf. Process. Lett..

[55]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.