论文信息 - Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings - 字舞流文

Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an efficient boosting algorithm that deals with very large cross-product spaces of word correspondences. We show in an experimental evaluation on patent prior art search that our approach, and in particular a consensus-based combination of boosting and translation-based approaches, yields substantial improvements in CLIR performance. Our training and test data are made publicly available.

Stefan Riezler | Artem Sokolov | Laura Jehl | Felix Hieber | S. Riezler | F. Hieber | Laura Jehl | Artem Sokolov

[1] Douglas W. Oard,et al. Probabilistic structured query methods , 2003, SIGIR.

[2] Michael I. Jordan,et al. Advances in Neural Information Processing Systems 30 , 1995 .

[3] Gareth J. F. Jones,et al. Combination Methods for Improving the Reliability of Machine Translation Based Cross-Language Information Retrieval , 2002, AICS.

[4] H. Damasio,et al. IEEE Transactions on Pattern Analysis and Machine Intelligence: Special Issue on Perceptual Organization in Computer Vision , 1998 .

[5] Gregory Grefenstette,et al. Cross-Language Information Retrieval , 1998, The Springer International Series on Information Retrieval.

[6] Christof Monz,et al. Adaptation of Statistical Machine Translation Model for Cross-Lingual Information Retrieval in a Service Context , 2012, EACL.

[7] Yunsong Guo,et al. Ranking Structured Documents: A Large Margin Based Approach for Patent Prior Art Search , 2009, IJCAI.

[8] M. I. Jordan. Leo Breiman , 2011, 1101.0929.

[9] Walid Magdy,et al. An efficient method for using machine translation technologies in cross-language patent search , 2011, CIKM '11.

[10] M. Utiyama,et al. A Japanese-English patent parallel corpus , 2007, MTSUMMIT.

[11] Vladimir Eidelman,et al. cdec: A Decoder, Alignment, and Learning Framework for Finite- State and Context-Free Translation Models , 2010, ACL.

[12] Michael Collins,et al. Discriminative Reranking for Natural Language Parsing , 2000, CL.

[13] Philipp Koehn,et al. Empirical Methods for Compound Splitting , 2003, EACL.

[14] Xi Chen,et al. Learning Preferences with Millions of Parameters by Enforcing Sparsity , 2010, 2010 IEEE International Conference on Data Mining.

[15] W. Bruce Croft,et al. Cross-lingual relevance models , 2002, SIGIR '02.

[16] Yoram Singer,et al. An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[17] James Allan,et al. A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[18] Yoram Singer,et al. Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[19] Ji Zhu,et al. Boosting as a Regularized Path to a Maximum Margin Classifier , 2004, J. Mach. Learn. Res..

[20] Masao Utiyama,et al. Overview of the Patent Translation Task at the NTCIR-7 Workshop , 2008, NTCIR.

[21] Samy Bengio,et al. A Discriminative Kernel-Based Approach to Rank Images from Text Queries , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Hermann Ney,et al. Improved Statistical Alignment Models , 2000, ACL.

[23] Javed A. Aslam,et al. Models for metasearch , 2001, SIGIR '01.

[24] Leo Breiman,et al. Bagging Predictors , 1996, Machine Learning.

[25] W. Bruce Croft,et al. Resolving ambiguity for cross-language retrieval , 1998, SIGIR '98.

[26] Leif Azzopardi,et al. A Methodology for Building a Patent Test Collection for Prior Art Search , 2008, EVIA@NTCIR.

[27] Jimmy J. Lin,et al. Looking inside the box: context-sensitive translation for cross-language information retrieval , 2012, SIGIR '12.

[28] Jimmy J. Lin,et al. Combining Statistical Translation Techniques for Cross-Language Information Retrieval , 2012, COLING.

[29] Stephen E. Robertson,et al. Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[30] John Langford,et al. Predictive Indexing for Fast Search , 2008, NIPS.

[31] Franz Josef Och,et al. Minimum Error Rate Training in Statistical Machine Translation , 2003, ACL.

[32] Syr Hui,et al. US Patent Application , 2013 .

[33] Changning Huang,et al. Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[34] Yoram Singer,et al. Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[35] Wei Gao,et al. Cross-lingual query suggestion using query logs of different languages , 2007, SIGIR.

[36] John Langford,et al. Hash Kernels , 2009, AISTATS.

[37] S. T. Buckland,et al. Computer-Intensive Methods for Testing Hypotheses. , 1990 .

[38] Jinxi Xu,et al. Evaluating a probabilistic model for cross-lingual information retrieval , 2001, SIGIR '01.

[39] Walid Magdy,et al. PRES: a score metric for evaluating recall-oriented information retrieval applications , 2010, SIGIR.

[40] K. J. Evans,et al. Computer Intensive Methods for Testing Hypotheses: An Introduction , 1990 .

[41] Dmitry Yurievich Pavlov,et al. BagBoo: a scalable hybrid bagging-the-boosting model , 2010, CIKM '10.

[42] Kenneth Heafield,et al. KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[43] Markus Freitag,et al. The RWTH Aachen System for NTCIR-10 PatentMT , 2013, NTCIR.

[44] Cristina V. Lopes,et al. Bagging gradient-boosted trees for high precision, low variance ranking models , 2011, SIGIR.

[45] David Chiang,et al. Hierarchical Phrase-Based Translation , 2007, CL.

[46] Yanjun Qi,et al. Learning to rank with (a lot of) word features , 2010, Information Retrieval.

[47] Adam Lopez,et al. Hierarchical Phrase-Based Translation with Suffix Arrays , 2007, EMNLP.

[48] E. A. Fox,et al. Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..