Subsequence Automata with Default Transitions

Abstract Let S be a string of length n with characters from an alphabet of size σ . The subsequence automaton of S (often called the directed acyclic subsequence graph ) is the minimal deterministic finite automaton accepting all subsequences of S . A straightforward construction shows that the size (number of states and transitions) of the subsequence automaton is O ( n σ ) and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with default transitions , that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the delay , i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter k , 1 k ≤ σ , we present a subsequence automaton with default transitions of size O ( n k log k ⁡ σ ) and delay O ( log k ⁡ σ ) . Hence, with k = 2 we obtain an automaton of size O ( n log ⁡ σ ) and delay O ( log ⁡ σ ) . At the other extreme, with k = σ , we obtain an automaton of size O ( n σ ) and delay O ( 1 ) , thus matching the bound for the standard subsequence automaton construction. Finally, we generalize the result to multiple strings. The key component of our result is a novel hierarchical automata construction of independent interest.

[1]  Ayumi Shinohara,et al.  Inferring Strings from Graphs and Arrays , 2003, MFCS.

[2]  Zdenek Tronícek,et al.  Operations on DASG , 1998, Workshop on Implementing Automata.

[3]  Philip Bille,et al.  Fast and compact regular expression matching , 2005, Theor. Comput. Sci..

[4]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[5]  frInstitute Gaspard-Monge,et al.  Directed Acyclic Subsequence Graph for multiple textsMaxime , 1999 .

[6]  Mohammad Sohel Rahman,et al.  Finite Automata Based Algorithms for the Generalized Constrained Longest Common Subsequence Problems , 2010, SPIRE.

[7]  Zdenêk Troniĉek,et al.  Common subsequence automaton , 2002, CIAA'02.

[8]  Borivoj Melichar,et al.  Directed acyclic subsequence graph - Overview , 2003, J. Discrete Algorithms.

[9]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[10]  Yan Luo,et al.  DPICO: a high speed deep packet inspection engine using compact finite automata , 2007, ANCS '07.

[11]  Ayumi Shinohara,et al.  The size of subsequence automaton , 2003, Theor. Comput. Sci..

[12]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM.

[13]  Borivoj Melichar,et al.  Directed Acyclic Subsequence Graph , 1998, Stringology.

[14]  Ricardo A. Baeza-Yates,et al.  Searching Subsequences , 1991, Theor. Comput. Sci..

[15]  Dimitrios Gunopulos,et al.  Episode Matching , 1997, CPM.

[16]  Ayumi Shinohara,et al.  Online construction of subsequence automata for multiple texts , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.