论文信息 - Multi-feature Based Chinese-English Named Entity Extraction from Comparable Corpora

Multi-feature Based Chinese-English Named Entity Extraction from Comparable Corpora

Bilingual Named Entity Extraction is important to some cross language information processes such as machine translation (MT), cross-lingual information retrieval (CLIR), etc. A lot of previous work extracted bilingual Named Entities from parallel corpus. Here we propose a multifeature based method to extract bilingual Named Entities from comparable corpus. We first recognize the Chinese and English Named Entities respectively from the Chinese and English part of the comparable corpus. Then all the feature scores are calculated for every possible pair of Chinese and English Named Entities. At last we combine these feature scores together and decide which pairs are mutual translations. For translation score calculation, we didn’t use the formula of IBM model 1 like previous approach. In stead, we used a modified edit distance to take the order of words into consideration. Experiment shows that the F-score of this method increased by 11 %. And with the multi-feature integration strategy encouraging results are obtained.

Jun Zhao | Min Lu | Jun Zhao | Mingmiao Lu

[1] Hwee Tou Ng,et al. Mining New Word Translations from Comparable Corpora , 2004, COLING.

[2] Ming Zhou,et al. A New Approach for English-Chinese Named Entity Alignment , 2004, EMNLP.

[3] Pascale Fung,et al. Mining Very-Non-Parallel Corpora: Parallel Sentence and Lexicon Extraction via Bootstrapping and E , 2004, EMNLP.

[4] Robert L. Mercer,et al. The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[5] EstimationPeter,et al. The Mathematics of Machine Translation : Parameter , 2004 .

[6] Alexander H. Waibel,et al. Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization , 2003, NER@ACL.

[7] Pascale Fung,et al. A Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora , 1998, AMTA.

[8] Douglas E. Appelt,et al. FASTUS: A Cascaded Finite-State Transducer for Extracting Information from Natural-Language Text , 1997, ArXiv.

[9] Yaser Al-Onaizan,et al. Named entity translation: extended abstract , 2002 .

[10] Hsin-Hsi Chen,et al. Learning Formulation and Transformation Rules for Multilingual Named Entities , 2003, NER@ACL.