The Smallest Automaton Recognizing the Subwords of a Text

Let a partial deterministic finite automaton be a DFA in which each state need not have a transition edge for each letter of the alphabet. We demonstrate that the smallest partial DFA for the set of all subwords of a given word w, Iwl>2, has at most 21w(-2 states and 3(wl-4 transition edges, independently of the alphabet size. We give an algorithm to build this smallest partial DFA from the input w on-line in linear time.

[1]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[2]  Michael Rodeh,et al.  Linear Algorithm for Data Compression via String Matching , 1981, JACM.

[3]  Steven L. Tanimoto A method for detecting structure in polygons , 1981, Pattern Recognit..

[4]  Max Chochemore Linear searching for a square in a word , 1984, Bull. EATCS.

[5]  Franco P. Preparata,et al.  Optimal Off-Line Detection of Repetitions in a String , 1983, Theor. Comput. Sci..

[6]  Peter Weiner,et al.  Linear Pattern Matching Algorithms , 1973, SWAT.

[7]  A. O. Slisenko,et al.  Detection of periodicities and string-matching in real time , 1983 .

[8]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[9]  Edward M. McCreight,et al.  A Space-Economical Suffix Tree Construction Algorithm , 1976, JACM.

[10]  David Haussler,et al.  Linear size finite automata for the set of all subwords of a word - an outline of results , 1983, Bull. EATCS.

[11]  A. Nerode,et al.  Linear automaton transformations , 1958 .

[12]  守屋 悦朗,et al.  J.E.Hopcroft, J.D. Ullman 著, "Introduction to Automata Theory, Languages, and Computation", Addison-Wesley, A5変形版, X+418, \6,670, 1979 , 1980 .

[13]  Maxime Crochemore Optimal Factor Transducers , 1985 .

[14]  J. Seiferas,et al.  Efficient and Elegant Subword-Tree Construction , 1985 .

[15]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[16]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[17]  Donald R. Morrison,et al.  PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric , 1968, J. ACM.

[18]  Mila E. Majster-Cederbaum,et al.  Efficient On-Line Construction and Correction of Position Trees , 1980, SIAM journal on computing (Print).

[19]  David Haussler,et al.  Building a complete inverted file for a set of text files in linear time , 1984, STOC '84.

[20]  A. O. Slisenko String-Matching in Real Time: Some Properties of the Data Structure , 1978, MFCS.

[21]  Maxime Crochemore Linear Searching for a Squre in a Word (Abstract) , 1984, ICALP.

[22]  Zvi Galil,et al.  String Matching in Real Time , 1981, JACM.