A study of user profile representation for personalized cross-language information retrieval

Purpose – With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion. Design/methodology/approach – The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods. Findings – Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suita...

[1]  Jianfeng Gao,et al.  Extending query translation to cross-language query expansion with markov chain models , 2007, CIKM '07.

[2]  Shih-Hung Wu,et al.  Query Expansion from Wikipedia and Topic Web Crawler on CLIR , 2010, NTCIR.

[3]  Changning Huang,et al.  Improving query translation for cross-language information retrieval using statistical models , 2001, SIGIR '01.

[4]  Susan T. Dumais,et al.  Personalizing search via automated analysis of interests and activities , 2005, SIGIR '05.

[5]  C. J. van Rijsbergen,et al.  Probabilistic models of information retrieval based on measuring the divergence from randomness , 2002, TOIS.

[6]  Wei Chu,et al.  Enhancing personalized search by mining and modeling task behavior , 2013, WWW.

[7]  David M. Blei,et al.  Multilingual Topic Models for Unaligned Text , 2009, UAI.

[8]  Peter Brusilovsky,et al.  User Models for Adaptive Hypermedia and Adaptive Educational Systems , 2007, The Adaptive Web.

[9]  Dong Zhou,et al.  Improving search via personalized query expansion using social media , 2012, Information Retrieval.

[10]  Douglas W. Oard,et al.  A comparative study of query and document translation for cross-language information retrieval , 1998, AMTA.

[11]  Yan Liu,et al.  Topic-link LDA: joint models of topic and author community , 2009, ICML '09.

[12]  Marie-Francine Moens,et al.  Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora , 2013, Information Retrieval.

[13]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[14]  Susan T. Dumais,et al.  Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing , 1998 .

[15]  Claudio Carpineto,et al.  A Survey of Automatic Query Expansion in Information Retrieval , 2012, CSUR.

[16]  Shih-Hung Wu,et al.  Query Expansion via Link Analysis of Wikipedia for CLIR , 2008, NTCIR.

[17]  Chong Wang,et al.  Collaborative topic modeling for recommending scientific articles , 2011, KDD.

[18]  Paul-Alexandru Chirita,et al.  Personalized query expansion for the web , 2007, SIGIR.

[19]  M. de Rijke,et al.  Personalized document re-ranking based on Bayesian probabilistic matrix factorization , 2014, SIGIR.

[20]  James Mayfield,et al.  Comparing cross-language query expansion techniques by degrading translation resources , 2002, SIGIR '02.

[21]  Efthimis N. Efthimiadis,et al.  User Choices: A new Yardstick for the Evaluation of Ranking Algorithms for Interactive Query Expansion , 1995, Inf. Process. Manag..

[22]  Ian Ruthven,et al.  Re-examining the potential effectiveness of interactive query expansion , 2003, SIGIR.

[23]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[24]  Eric Brill,et al.  Improving web search ranking by incorporating user behavior information , 2006, SIGIR.

[25]  Evgeniy Gabrilovich,et al.  Concept-Based Information Retrieval Using Explicit Semantic Analysis , 2011, TOIS.

[26]  Vincent P. Wade,et al.  Personalised Information Retrieval: survey and classification , 2013, User Modeling and User-Adapted Interaction.

[27]  Dong Zhou,et al.  Iterative Refinement Methods for Enhanced Information Retrieval , 2014, Int. J. Intell. Syst..

[28]  Vincent P. Wade,et al.  Towards Personalized Multilingual Information Access - Exploring the Browsing and Search Behavior of Multilingual Users , 2014, UMAP.

[29]  Dong Zhou,et al.  Towards multilingual user models for Personalized Multilingual Information Retrieval , 2011, PMHR '11.

[30]  Susan T. Dumais,et al.  To personalize or not to personalize: modeling queries with variation in user intent , 2008, SIGIR '08.

[31]  Jian Hu,et al.  Cross lingual text classification by mining multilingual topics from wikipedia , 2011, WSDM '11.

[32]  Alessandro Micarelli,et al.  Anatomy and Empirical Evaluation of an Adaptive Web-Based Information Filtering System , 2004, User Modeling and User-Adapted Interaction.

[33]  Dong Zhou,et al.  Translation techniques in cross-language information retrieval , 2012, CSUR.

[34]  Gary Marchionini,et al.  Examining the effectiveness of real-time query expansion , 2007, Inf. Process. Manag..

[35]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[36]  Eric P. Xing,et al.  BiTAM: Bilingual Topic AdMixture Models for Word Alignment , 2006, ACL.

[37]  Stephen E. Robertson,et al.  Selecting good expansion terms for pseudo-relevance feedback , 2008, SIGIR '08.

[38]  Philipp Cimiano,et al.  Cross-language Information Retrieval with Explicit Semantic Analysis , 2008, CLEF.

[39]  Vamshi Ambati Using Monolingual Clickthrough Data to Build Cross-lingual Search Systems , 2006 .

[40]  Miguel E. Ruiz,et al.  Users' Image Seeking Behavior in a Multilingual Tag Environment , 2009, CLEF.

[41]  Dong Zhou,et al.  A late fusion approach to cross-lingual document re-ranking , 2010, CIKM '10.

[42]  Susan T. Dumais,et al.  Improving Web Search Ranking by Incorporating User Behavior Information , 2019, SIGIR Forum.

[43]  Dong Zhou,et al.  Multilingual user modeling for personalized re-ranking of multilingual web search results , 2012, UMAP Workshops.

[44]  M. de Rijke,et al.  Building simulated queries for known-item topics: an analysis using six european languages , 2007, SIGIR.

[45]  Andrew McCallum,et al.  Polylingual Topic Models , 2009, EMNLP.

[46]  Yong Yu,et al.  Exploring folksonomy for personalized search , 2008, SIGIR '08.

[47]  Mark Sanderson,et al.  Improving cross language retrieval with triangulated translation , 2001, SIGIR '01.

[48]  Luis M. de Campos,et al.  An automatic methodology to evaluate personalized information retrieval systems , 2014, User Modeling and User-Adapted Interaction.

[49]  Steffen Staab,et al.  Explicit Versus Latent Concept Models for Cross-Language Information Retrieval , 2009, IJCAI.

[50]  ChengXiang Zhai,et al.  Implicit user modeling for personalized search , 2005, CIKM '05.