论文信息 - Combining machine learning and rule-based approaches in Spanish syntactic generation

Combining machine learning and rule-based approaches in Spanish syntactic generation

Aquesta tesi descriu una gramatica de Generacio que combina regles escrites a ma i tecniques daprenentatge automatic. Aquesta gramatica pertany a un sistema de Traduccio Automatica de qualitat comercial desenvolupat a Microsoft Research. La primera part presenta la gramatica i les principals estrategies linguistiques que aquesta gramatica implementa. Els requeriments de robustesa que reclama lus real del sistema de TA, exigeix del Generador un esforc suplementari que es resol afegint un nivell de pre-generacio, capac de garantir la integritat de lentrada, sense incorporar elements ad-hoc en les regles de la gramatica. A la segona part, explorem lus dels classificadors darbres de decisio (DT) per tal daprendre automaticament una de les operacions que tenen lloc al modul de pre-generacio, en concret la seleccio lexica del verb copulatiu en espanyol (ser o estar). Mostrem que es possible inferir a partir dexemples els contextos per aquest fenomen linguistic no trivial, amb gran precisio. Resumen This thesis describes a Spanish Generation grammar which combines hand-written rules and Machine Learning techniques. This grammar belongs to a full-scale commercial quality Machine Translation system developed at Microsoft Research. The first part presents the grammar and the linguistic strategies it embodies. The need for robustness in real-world situations in the everyday use of the MT system requires from the Generator an extra effort which is resolved by adding a Pre-Generation layer which is able to fix the input to Generation, without contaminating the grammar rules. In the second part we explore the use of Decision Tree classifiers (DT) for automatically learning one of the operations that take place in the Pre-Generation component, namely lexical selection of the Spanish copula (i.e. ser and estar). We show that it is possible to infer from examples the contexts for this non-trivial linguistic phenomenon with high accuracy.

Melero Nogués | Maria Teresa | María Teresa Navés Nogués

[1] Srinivas Bangalore,et al. Evaluation Metrics for Generation , 2000, INLG.

[2] Helmut Horacek,et al. A Flexible Shallow Approach to Text Generation , 1998, INLG.

[3] Nicoletta Calzolari,et al. Multilingual Summarization by Integrating Linguistic Resources in the MLIS-MUSI Project , 2002, LREC.

[4] Eric Nyberg,et al. The GenKit and Transformation Kit User''''s Guide , 1988 .

[5] Nils J. Nilsson,et al. Artificial Intelligence , 1974, IFIP Congress.

[6] Gunther Kress,et al. System and Function in Language , 1978 .

[7] Igor Mel’čuk,et al. Dependency Syntax: Theory and Practice , 1987 .

[8] James L. McClelland,et al. Explorations in parallel distributed processing: a handbook of models, programs, and exercises , 1988 .

[9] Margarita Porroche Ballesteros,et al. Ser, estar y verbos de cambio , 1988 .

[10] Michael Gamon,et al. An Overview of Amalgam: A Machine-learned Generation Module , 2002, INLG.

[11] Pete Whitelock,et al. Shake-and-Bake Translation , 1992, COLING.