A Better N-Best List: Practical Determinization of Weighted Finite Tree Automata

Ranked lists of output trees from syntactic statistical NLP applications frequently contain multiple repeated entries. This redundancy leads to misrepresentation of tree weight and reduced information for debugging and tuning purposes. It is chiefly due to nondeterminism in the weighted automata that produce the results. We introduce an algorithm that determinizes such automata while preserving proper weights, returning the sum of the weight of all multiply derived trees. We also demonstrate our algorithm's effectiveness on two large-scale tasks.

[1]  John Doner,et al.  Tree Acceptors and Some of Their Applications , 1970, J. Comput. Syst. Sci..

[2]  James W. Thatcher,et al.  Generalized finite automata theory with an application to a decision problem of second-order logic , 1968, Mathematical systems theory.

[3]  David Chiang,et al.  Better k-best Parsing , 2005, IWPT.

[4]  Mehryar Mohri,et al.  Finite-State Transducers in Language and Speech Processing , 1997, CL.

[5]  Kevin Knight,et al.  The Practical Value of N-Grams Is in Generation , 1998, INLG.

[6]  Edsger W. Dijkstra,et al.  A note on two problems in connexion with graphs , 1959, Numerische Mathematik.

[7]  Mehryar Mohri,et al.  An efficient algorithm for the n-best-strings problem , 2002, INTERSPEECH.

[8]  Aravind K. Joshi,et al.  Mathematical and computational aspects of lexicalized grammars , 1990 .

[9]  Rens Bod An efficient implementation of a new DOP model , 2003, EACL.

[10]  Kevin Knight,et al.  Interactively Exploring a Machine Translation Model , 2005, ACL.

[11]  Daniel Marcu,et al.  What’s in a translation rule? , 2004, NAACL.

[12]  Francisco Casacuberta,et al.  Submission to ICGI-2000 Computational complexity of problems on probabilistic grammars and transducers , 2007 .

[13]  Heiko Vogler,et al.  Determinization of Finite State Weighted Tree Automata , 2003, J. Autom. Lang. Comb..

[14]  Rens Bod,et al.  A Computational Model of Language Performance: Data Oriented Parsing , 1992, COLING.

[15]  Walter S. Brainerd,et al.  Tree Generating Regular Systems , 1969, Inf. Control..

[16]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[17]  M. Rabin Decidability of second-order theories and automata on infinite trees , 1968 .

[18]  Khalil Simaan,et al.  Computational Complexity of Probabilistic Disambiguation by means of Tree-Grammars , 1996, COLING.

[19]  Hubert Comon,et al.  Tree automata techniques and applications , 1997 .

[20]  Ferenc Gécseg,et al.  Minimal ascending tree automata , 1978, Acta Cybern..