Subsequence Automata with Default Transitions

Let S be a string of length n with characters from an alphabet of size $$\sigma $$. The subsequence automaton of S often called the directed acyclic subsequence graph is the minimal deterministic finite automaton accepting all subsequences of S. A straightforward construction shows that the size number of states and transitions of the subsequence automaton is $$On\sigma $$ and that this bound is asymptotically optimal. In this paper, we consider subsequence automata with default transitions, that is, special transitions to be taken only if none of the regular transitions match the current character, and which do not consume the current character. We show that with default transitions, much smaller subsequence automata are possible, and provide a full trade-off between the size of the automaton and the delay, i.e., the maximum number of consecutive default transitions followed before consuming a character. Specifically, given any integer parameter k, $$1 < k \le \sigma $$, we present a subsequence automaton with default transitions of size $$Onk\log _{k}\sigma $$ and delay $$O\log _k \sigma $$. Hence, with $$k = 2$$ we obtain an automaton of size $$On \log \sigma $$ and delay $$O\log \sigma $$. On the other extreme, with $$k = \sigma $$, we obtain an automaton of size $$On \sigma $$ and delay O1, thus matching the bound for the standard subsequence automaton construction. The key component of our result is a novel hierarchical automata construction of independent interest.

[1]  Ayumi Shinohara,et al.  The size of subsequence automaton , 2005, Theor. Comput. Sci..

[2]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[3]  Ayumi Shinohara,et al.  Inferring Strings from Graphs and Arrays , 2003, MFCS.

[4]  Mohammad Sohel Rahman,et al.  Finite Automata Based Algorithms for the Generalized Constrained Longest Common Subsequence Problems , 2010, SPIRE.

[5]  Ayumi Shinohara,et al.  Online construction of subsequence automata for multiple texts , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[6]  Ricardo A. Baeza-Yates,et al.  Searching Subsequences , 1991, Theor. Comput. Sci..

[7]  Zdenêk Troniĉek,et al.  Common subsequence automaton , 2002, CIAA'02.

[8]  Zdenek Tronícek,et al.  Operations on DASG , 1998, Workshop on Implementing Automata.

[9]  Zdenek Tronícek,et al.  Episode Matching , 2001, CPM.

[10]  Yan Luo,et al.  DPICO: a high speed deep packet inspection engine using compact finite automata , 2007, ANCS '07.

[11]  Borivoj Melichar,et al.  Directed acyclic subsequence graph - Overview , 2003, J. Discrete Algorithms.

[12]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[13]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[14]  Philip Bille,et al.  Fast and compact regular expression matching , 2005, Theor. Comput. Sci..