Ranking vs. Classification: A Case Study in Mining Organization Name Translation from Snippets

Both classification and ranking strategy have been reported positively in mining the named entity (NE) translation from the snippets re-turned by the web search engine. Taking the most challenging issue of the organization name and its translation as an example, this paper conducts a contrastive study on the two strategies under SVM framework. We empirically show that the method of translation ranking achieves the best performance in various data settings, with the best Top-1 precision up to 65.75%. We conclude that, compared with the classification strategy, the ranking strategy is more suitable in such snippet based translation mining, in which the unbalance data issue prevails.

[1]  Ralf Herbrich,et al.  Large margin rank boundaries for ordinal regression , 2000 .

[2]  Hsi-Jian Lee,et al.  Anchor text mining for translation extraction of query terms , 2001, SIGIR '01.

[3]  Ying Zhang,et al.  Automatic Acquisition of Chinese-English Parallel Corpus from the Web , 2006, ECIR.

[4]  Qiang Yang,et al.  Web query translation via web log mining , 2008, SIGIR '08.

[5]  Fredric C. Gey,et al.  Combining multiple sources for short query translation in Chinese-English cross-language information retrieval , 2000, IRAL '00.

[6]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[7]  Hang Li,et al.  Base Noun Phrase Translation Using Web Data and the EM Algorithm , 2002, COLING.

[8]  Tie-Yan Liu,et al.  Adapting ranking SVM to document retrieval , 2006, SIGIR.

[9]  Long Jiang,et al.  Named Entity Translation with Web Mining and Transliteration , 2007, IJCAI.

[10]  Kenji Suzuki,et al.  Using the Web as a Bilingual Dictionary , 2001, DDMMT@ACL.

[11]  Hitoshi Isahara,et al.  Incorporating Pronunciation Variation into Extraction of Transliterated-term Pairs from Web Corpora , 2005, J. Chin. Lang. Comput..

[12]  Pu-Jen Cheng,et al.  Translating unknown queries with web corpora for cross-language information retrieval , 2004, SIGIR '04.

[13]  Hao Yu,et al.  Web Translation Mining Based on Suffix Arrays , 2007, J. Chin. Lang. Comput..

[14]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[15]  Tiejun Zhao,et al.  Web based translation of Chinese organization name , 2009 .

[16]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[17]  Ying Zhang,et al.  Mining translations of OOV terms from the web through cross-lingual query expansion , 2005, SIGIR '05.