论文信息 - NP Subject Detection in Verb-initial Arabic Clauses

NP Subject Detection in Verb-initial Arabic Clauses

Phrase re-ordering is a well-known obstacle to robust machine translation for language pairs with significantly different word orderings. For Arabic-English, two languages that usually differ in the ordering of subject and verb, the subject and its modifiers must be accurately moved to produce a grammatical translation. This operation requires more than base phrase chunking and often defies current phrase-based statistical decoders. We present a conditional random field sequence classifier that detects the full scope of Arabic noun phrase subjects in verb-initial clauses at the Fβ=1 61.3% level, a 5.0% absolute improvement over a statistical parser baseline. We suggest methods for integrating the classifier output with a statistical decoder and present preliminary machine translation results.

Christopher D. Manning | Spence Green | Conal Sathi

[1] Abdelkader Fassi Fehri,et al. Issues in the Structure of Arabic Clauses and Words , 1993 .

[2] Kevin Knight,et al. Decoding Complexity in Word-Replacement Translation Models , 1999, Comput. Linguistics.

[3] Eugene Charniak,et al. Assigning Function Tags to Parsed Text , 2000, ANLP.

[4] Daniel Gildea,et al. Automatic Labeling of Semantic Roles , 2000, ACL.

[5] Ted Briscoe,et al. High Precision Extraction of Grammatical Relations , 2001, COLING.

[6] Mats Rooth,et al. Parse Forest Computation of Expected Governors , 2001, ACL.

[7] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8] Dan Klein,et al. Fast Exact Inference with a Factored Model for Natural Language Parsing , 2002, NIPS.

[9] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[10] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[11] Dan Klein,et al. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network , 2003, NAACL.