Improved named entity translation and bilingual named entity extraction

Translation of named entities (NE), including proper names, temporal and numerical expressions, is very important in multilingual natural language processing, like crosslingual information retrieval and statistical machine translation. We present an integrated approach to extract a named entity translation dictionary from a bilingual corpus while at the same time improving the named entity annotation quality. Starting from a bilingual corpus where the named entities are extracted independently for each language, a statistical alignment model is used to align the named entities. An iterative process is applied to extract named entity pairs with higher alignment probability. This leads to a smaller but cleaner named entity translation dictionary and also to a significant improvement of the monolingual named entity annotation quality for both languages. Experimental result shows that the dictionary size is reduced by 51.8% and the annotation quality is improved from 70.03 to 78.15 for Chinese and 73.38 to 81.46 in terms of F-score.