论文信息 - Bootstrapping Lexical Choice via Multiple-Sequence Alignment - 字舞流文

Bootstrapping Lexical Choice via Multiple-Sequence Alignment

An important component of any generation system is the mapping dictionary, a lexicon of elementary semantic expressions and corresponding natural language realizations. Typically, labor-intensive knowledge-based methods are used to construct the dictionary. We instead propose to acquire it automatically via a novel multiple-pass algorithm employing multiple-sequence alignment, a technique commonly used in bioinformatics. Crucially, our method lever-ages latent information contained in multi-parallel corpora --- datasets that supply several verbalizations of the corresponding semantics rather than just one.We used our techniques to generate natural language versions of computer-generated mathematical proofs, with good results on both a per-component and overall-output basis. For example, in evaluations involving a dozen human judges, our system produced output whose readability and faithfulness to the semantic input rivaled that of a traditional generation system.

Regina Barzilay | Lillian Lee | R. Barzilay | Lillian Lee

[1] Daniel L. Chester,et al. The Translation of Formal Proofs into English , 1976, Artif. Intell..

[2] Rance Cleaveland,et al. Implementing mathematics with the Nuprl proof development system , 1986 .

[3] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[4] Chris Brew,et al. Automatic Evaluation of Computer Generated Text: A Progress Report on the TextEval Project , 1994, HLT.

[5] Tao Jiang,et al. On the Complexity of Multiple Sequence Alignment , 1994, J. Comput. Biol..

[6] T.J.P. Hubbard,et al. Gathering them in to the fold , 1996, Nature Structural Biology.

[7] H. Thompson,et al. Automatic Evaluation of Computer Generated Text : Final Report on the TextEval Project , 1996 .

[8] Salim Roukos,et al. Feature-based language understanding , 1997, EUROSPEECH.

[9] Dan Gusfield. Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[10] Xiaorong Huang,et al. Proof Verbalization as an Application of NLG , 1997, IJCAI.

[11] Kevin Knight,et al. Generation that Exploits Corpus-Based Statistical Knowledge , 1998, ACL.

[12] R. Durbin,et al. Biological sequence analysis: Background on probability , 1998 .

[13] Volker Sorge,et al. LΩUI: Lovely ΩMEGA User Interface , 1999, Formal Aspects of Computing.

[14] Robert L. Constable,et al. Verbalization of High-Level Formal Proofs , 1999, AAAI/IAAI.

[15] Michel Simard. Text-Translation Alignment: Three Languages Are Better Than Two , 1999, EMNLP.

[16] Adwait Ratnaparkhi,et al. Trainable Methods for Surface Natural Language Generation , 2000, ANLP.

[17] I. Dan Melamed,et al. Models of translation equivalence among words , 2000, CL.

[18] Hermann Ney,et al. Improved Statistical Alignment Models , 2000, ACL.

[19] Srinivas Bangalore,et al. Exploiting a Probabilistic Hierarchical Model for Generation , 2000, COLING.

[20] Menno van Zaanen. Bootstrapping Syntax and Recursion using Alginment-Based Learning , 2000, ICML.

[21] Alexander I. Rudnicky,et al. Stochastic Language Generation for Spoken Dialogue Systems , 2000 .

[22] Andreas Stolcke,et al. Finding consensus in speech recognition: word error minimization and other applications of confusion networks , 2000, Comput. Speech Lang..

[23] Hermann Ney,et al. Natural language understanding using statistical machine translation , 2001, INTERSPEECH.

[24] Regina Barzilay,et al. Extracting Paraphrases from a Parallel Corpus , 2001, ACL.

[25] Ehud Reiter,et al. Book Reviews: Building Natural Language Generation Systems , 2000, CL.

[26] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.