Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval

Smartphones and tablets with their apps pervaded our everyday life, leading to a new demand for search tools to help users find the right apps to satisfy their immediate needs. While there are a few commercial mobile app search engines available, the new task of mobile app retrieval has not yet been rigorously studied. Indeed, there does not yet exist a test collection for quantitatively evaluating this new retrieval task. In this paper, we first study the effectiveness of the state-of-the-art retrieval models for the app retrieval task using a new app retrieval test data we created. We then propose and study a novel approach that generates a new representation for each app. Our key idea is to leverage user reviews to find out important features of apps and bridge vocabulary gap between app developers and users. Specifically, we jointly model app descriptions and user reviews using topic model in order to generate app representations while excluding noise in reviews. Experiment results indicate that the proposed approach is effective and outperforms the state-of-the-art retrieval models for app retrieval.

[1]  Andrew McCallum,et al.  Rethinking LDA: Why Priors Matter , 2009, NIPS.

[2]  James P. Callan,et al.  Combining document representations for known-item search , 2003, SIGIR.

[3]  John D. Lafferty,et al.  A study of smoothing methods for language models applied to Ad Hoc information retrieval , 2001, SIGIR '01.

[4]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[5]  Gerhard Friedrich,et al.  Recommender Systems - An Introduction , 2010 .

[6]  Tao Tao,et al.  A formal study of information retrieval heuristics , 2004, SIGIR '04.

[7]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[8]  Mounia Lalmas,et al.  Overview of the INEX 2007 Entity Ranking Track , 2008, INEX.

[9]  Tat-Seng Chua,et al.  Addressing cold-start in app recommendation: latent user models constructed from twitter followers , 2013, SIGIR.

[10]  James A. Thom,et al.  Entity ranking in Wikipedia , 2007, SAC '08.

[11]  Ellen M. Voorhees,et al.  Retrieval evaluation with incomplete information , 2004, SIGIR '04.

[12]  ChengXiang Zhai,et al.  FindiLike: preference driven entity search , 2012, WWW.

[13]  Víctor Fresno-Fernández,et al.  Integrating the Probabilistic Models BM25/BM25F into Lucene , 2009, ArXiv.

[14]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[15]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[16]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[17]  Maarten de Rijke,et al.  XML retrieval: what to retrieve? , 2003, SIGIR '03.

[18]  Wei Li,et al.  Pachinko allocation: DAG-structured mixture models of topic correlations , 2006, ICML.

[19]  Wang-Chien Lee,et al.  App recommendation: a contest between satisfaction and temptation , 2013, WSDM.

[20]  S. Robertson The probability ranking principle in IR , 1997 .

[21]  Yi Chen,et al.  XSeek: A Semantic XML Search Engine Using Keywords , 2007, VLDB.

[22]  ChengXiang Zhai,et al.  Supporting Keyword Search in Product Database: A Probabilistic Approach , 2013, Proc. VLDB Endow..

[23]  James Allan,et al.  A Comparative Study of Utilizing Topic Models for Information Retrieval , 2009, ECIR.

[24]  Hui Xiong,et al.  Mobile app recommendations with security and privacy awareness , 2014, KDD.

[25]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[26]  W. Bruce Croft,et al.  A Language Modeling Approach to Information Retrieval , 1998, SIGIR Forum.

[27]  Nargis Pervin,et al.  Mobilewalla: A Mobile Application Search Engine , 2011, MobiCASE.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[29]  ChengXiang Zhai,et al.  Opinion-based entity ranking , 2012, Information Retrieval.

[30]  W. Bruce Croft,et al.  A general language model for information retrieval , 1999, CIKM '99.

[31]  James A. Thom,et al.  Exploiting Locality of Wikipedia Links in Entity Ranking , 2008, ECIR.