Clause alignment for Hong Kong legal texts: A lexical-based approach

In this paper we report on our recent work in clause alignment for English-Chinese bilingual legal texts using available lexical resources including a bilingual legal glossary and a bilingual dictionary, for the purpose of acquiring examples at various linguistic levels for example-based machine translation. We present our formulation of an appropriate measure for the similarity of a candidate pair of clauses with respect to matched lexical items and the corresponding implementation of an effective algorithm for clause alignment based on this similarity measure. Experimental results show that the similarity measure and the lexical-based clause alignment algorithm, though very simple, are very effective, with a performance of 94.6% alignment accuracy. It confirms our intuition that lexical information gives a reliable indication of correct alignment. The significance of this lexical-based approach lies in both its simplicity and effectiveness.

[1]  D. W. Barron Machine Translation , 1968, Nature.

[2]  Hitoshi Iida,et al.  An Example-Based Disambiguation of Prepositional Phrase Attachment , 1993, TMI.

[3]  Kenneth Ward Church Char_align: A Program for Aligning Parallel Texts at the Character Level , 1993, ACL.

[4]  Ralf D. Brown,et al.  Example-Based Machine Translation in the Pangloss System , 1996, COLING.

[5]  Pascale Fung,et al.  A Technical Word- and Term-Translation Aid Using Noisy Parallel Corpora across Language Groups , 2004, Machine Translation.

[6]  Fred Popowich,et al.  What is example-based machine translation? , 2001, MTSUMMIT.

[7]  K. Sin,et al.  Language engineering for legal transplantation: Conceptual problems in creating common law Chinese , 1996 .

[8]  Stanley F. Chen,et al.  Aligning Sentences in Bilingual Corpora Using Lexical Information , 1993, ACL.

[9]  Sergei Nirenburg,et al.  A Statistical Approach to Machine Translation , 2003 .

[10]  Yorick Wilks,et al.  The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation , 1998, Natural Language Engineering.

[11]  Klaus Schubert Linguistic and extra-linguistic knowledge , 2004, Computers and translation.

[12]  Kenneth Ward Church,et al.  Robust Bilingual Word Alignment for Machine Aided Translation , 1993, VLC@ACL.

[13]  Alan K. Melby,et al.  The Possibility of Language: A Discussion of the Nature of Language , 1995 .

[14]  Eiichiro Sumita,et al.  Translating with Examples: A New Approach to Machine Translation , 2005 .

[15]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars and Bilingual Parsing of Parallel Corpora , 1997, CL.

[16]  Adam Kilgarriff,et al.  MRDs, Standards and How To Do Lexical Engineering , 1995 .

[17]  佐藤 理史,et al.  Example-based translation of technical terms , 1993 .

[18]  John Cocke,et al.  A Statistical Approach to Machine Translation , 1990, CL.

[19]  Robert L. Mercer,et al.  Aligning Sentences in Parallel Corpora , 1991, ACL.

[20]  Sergei Nirenburg,et al.  A Full-Text Experiment in Example-Based Machine Translation , 1994 .

[21]  Martin Kay,et al.  Text-Translation Alignment , 1993, Comput. Linguistics.

[22]  Dekai Wu,et al.  Stochastic Inversion Transduction Grammars, with Application to Segmentation, Bracketing, and Alignment of Parallel Corpora , 1995, IJCAI.

[23]  Satoshi Sato Example-based machine translation , 1992 .

[24]  Sergei Nirenburg,et al.  The Proper Place of Men and Machines in Language Translation , 2003 .

[25]  Satoshi Sato,et al.  Toward Memory-based Translation , 1990, COLING.

[26]  Harold L. SOMERS “ New paradigms ” in MT : the state of play now that the dust has settled , 2003 .

[27]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[28]  Dekai Wu,et al.  Large-scale automatic extraction of an English-Chinese translation lexicon , 2004, Machine Translation.

[29]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[30]  P. J. Arthern,et al.  Machine translation and computerised terminology systems - a translator’s viewpoint , 1978, TC.

[31]  John Hutchins,et al.  The Origins of the Translator's Workstation , 1998, Machine Translation.

[32]  Dekai Wu Grammarless extraction of phrasal translation examples from parallel texts , 1995 .

[33]  Robert Dale,et al.  Handbook of Natural Language Processing , 2001, Computational Linguistics.