Bisimulation Minimisation of Weighted Automata on Unranked Trees

Several models of automata are available that operate unranked trees. Two well-known examples are the stepwise unranked tree automaton (suta) and the parallel unranked tree automaton (puta). By adding a weight, taken from some semiring, to every transition we generalise these two qualitative automata models to quantitative models, thereby obtaining weighted stepwise unranked tree automata (wsuta) and weighted parallel unranked tree automata (wputa); the qualitative automata models are reobtained by choosing the BOOLEAN semiring. The weighted versions have applications in natural language processing, XML-based data management and quantitative information retrieval. We address the minimisation problem of wsuta and wputa by using (forward and backward) bisimulations and we prove the following results: (1) for every wsuta an equivalent forward (resp. backward) bisimulation minimal wsuta can be computed in time O(mn) where n is the number of states and m is the number of transitions of the given wsuta; (2) the same result is proved for wputa instead of wsuta; (3) if the semiring is additive cancellative or the BOOLEAN semiring, then the bound can be improved to O(mlog n) for both wsuta and wputa; (4) for every deterministic puta we can compute a minimal equivalent deterministic puta in time O(mlog n); (5) the automata models wsuta, wputa, and weighted unranked tree automaton have the same computational power.

[1]  James H. Martin,et al.  Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition , 2000 .

[2]  Frank Neven,et al.  Automata, Logic, and XML , 2002, CSL.

[3]  Manfred Droste,et al.  A Kleene Theorem for Weighted Tree Automata , 2004, Theory of Computing Systems.

[4]  Thomas Schwentick,et al.  XML: Model, Schemas, Types, Logics, and Queries , 2003, Logics for Emerging Applications of Databases.

[5]  Robin Milner,et al.  A Calculus of Communicating Systems , 1980, Lecture Notes in Computer Science.

[6]  Christof Löding,et al.  Deterministic Automata on Unranked Trees , 2005, FCT.

[7]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[8]  Albert R. Meyer,et al.  The Equivalence Problem for Regular Expressions with Squaring Requires Exponential Space , 1972, SWAT.

[9]  Joachim Niehren,et al.  Querying Unranked Trees with Stepwise Tree Automata , 2004, RTA.

[10]  Peter Lammich,et al.  Tree Automata , 2009, Arch. Formal Proofs.

[11]  M. Droste,et al.  Handbook of Weighted Automata , 2009 .

[12]  Andreas Maletti,et al.  Backward and Forward Bisimulation Minimisation of Tree Automata , 2007, CIAA.

[13]  Georg Schnitger,et al.  Minimizing nfa's and regular expressions , 2007, J. Comput. Syst. Sci..

[14]  Francesca Rossi,et al.  Semiring-based constraint logic programming: syntax and semantics , 2001, TOPL.

[15]  Tao Jiang,et al.  Minimal NFA Problems are Hard , 1991, SIAM J. Comput..

[16]  Peter Buchholz,et al.  Bisimulation relations for weighted automata , 2008, Theor. Comput. Sci..

[17]  Sergio Greco,et al.  Weighted path queries on semistructured databases , 2006, Inf. Comput..

[18]  Werner Kuich Formal Power Series over Trees , 1997, Developments in Language Theory.

[19]  Zoltán Ésik,et al.  Formal Tree Series , 2002, J. Autom. Lang. Comb..

[20]  Andreas Malcher,et al.  Minimizing finite automata is computationally hard , 2004, Theor. Comput. Sci..

[21]  Andreas Maletti,et al.  Backward and forward bisimulation minimization of tree automata , 2009, Theor. Comput. Sci..

[22]  Symeon Bozapalidis,et al.  Weighted Grammars and Kleene's Theorem , 1987, Inf. Process. Lett..

[23]  Georg Schnitger,et al.  Minimizing NFA's and Regular Expressions , 2005, STACS.

[24]  Joachim Niehren,et al.  Minimizing Tree Automata for Unranked Trees , 2005, DBPL.

[25]  Heiko Vogler,et al.  Weighted monadic datalog , 2008, Theor. Comput. Sci..

[26]  H. Vogler,et al.  Weighted Tree Automata and Tree Transducers , 2009 .

[27]  Marcel Paul Schützenberger,et al.  On the Definition of a Family of Automata , 1961, Inf. Control..

[28]  Jean Berstel,et al.  Recognizable Formal Power Series on Trees , 1982, Theor. Comput. Sci..

[29]  J. Golan Semirings and their applications , 1999 .

[30]  J W Ballard,et al.  Data on the web? , 1995, Science.

[31]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[32]  James H. Martin,et al.  Speech and Language Processing An Introduction to Natural Language Processing , Computational Linguistics , and Speech Recognition Second Edition , 2008 .

[33]  Ferenc Gécseg,et al.  Tree Languages , 1997, Handbook of Formal Languages.

[34]  Parosh Aziz Abdulla,et al.  Bisimulation Minimization of Tree Automata , 2006, Int. J. Found. Comput. Sci..

[35]  Andreas Maletti,et al.  Bisimulation Minimisation for Weighted Tree Automata , 2007, Developments in Language Theory.

[36]  Derick Wood,et al.  Regular tree and regular hedge languages over unranked alphabets , 2001 .