Panorama: a semantic-aware application search framework

Third-party applications (or commonly referred to the apps) proliferate on the web and mobile platforms in recent years. The tremendous amount of available apps in app market-places suggests the necessity of designing effective app search engines. However, existing app search engines typically ignore the latent semantics in the app corpus and thus usually fail to provide high-quality app snippets and effective app rankings. In this paper, we present a novel framework named Panorama to provide independent search results for Android apps with semantic awareness. We first propose the App Topic Model (ATM) to discover the latent semantics from the app corpus. Based on the discovered semantics, we tackle two central challenges that are faced by current app search engines: (1) how to generate concise and informative snippets for apps and (2) how to rank apps effectively with respect to search queries. To handle the first challenge, we propose several new metrics for measuring the quality of the sentences in app description and develop a greedy algorithm with fixed probability guarantee of near-optimal performance for app snippet generation. To handle the second challenge, we propose a variety of new features for app ranking and also design a new type of inverted index to support efficient Top-k app retrieval. We conduct extensive experiments on a large-scale data collection of Android apps and build an app search engine prototype for human-based performance evaluation. The proposed framework demonstrates superior performance against several strong baselines with respect to different metrics.

[1]  W. Bruce Croft,et al.  Efficient document retrieval in main memory , 2007, SIGIR.

[2]  Ryan T. McDonald A Study of Global Inference Algorithms in Multi-document Summarization , 2007, ECIR.

[3]  Umberto Straccia,et al.  Web metasearch: rank vs. score based rank aggregation methods , 2003, SAC '03.

[4]  Jean-Marc Dewaele,et al.  Formality of Language: definition, measurement and behavioral determinants , 1999 .

[5]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[6]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[7]  Alistair Moffat,et al.  Structured Index Organizations for High-Throughput Text Querying , 2006, SPIRE.

[8]  Torsten Suel,et al.  Faster top-k document retrieval using block-max indexes , 2011, SIGIR.

[9]  Jiří Mazurek,et al.  EVALUATION OF RANKING SIMILARITY IN ORDINAL RANKING PROBLEMS , 2011 .

[10]  Di Jiang,et al.  G-WSTD: a framework for geographic web search topic discovery , 2012, CIKM.

[11]  Michael I. Jordan Graphical Models , 2003 .

[12]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[13]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[14]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[15]  Marie-Francine Moens,et al.  Plink-LDA: Using Link as Prior Information in Topic Modeling , 2012, DASFAA.

[16]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Andrei Z. Broder,et al.  Efficient query evaluation using a two-level retrieval process , 2003, CIKM '03.

[18]  James Allan,et al.  Topic Models for Summarizing Novelty , 2001 .

[19]  Vagelis Hristidis,et al.  A system for query-specific document summarization , 2006, CIKM '06.

[20]  Michal Rosen-Zvi,et al.  Latent Topic Models for Hypertext , 2008, UAI.

[21]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[22]  Tao Li,et al.  Multi-Document Summarization via the Minimum Dominating Set , 2010, COLING.

[23]  Jane Greenberg,et al.  Using BM25F for semantic search , 2010, SEMSEARCH '10.

[24]  Alex Alves Freitas,et al.  Automatic Text Summarization Using a Machine Learning Approach , 2002, SBIA.

[25]  Charles L. A. Clarke,et al.  Information Retrieval - Implementing and Evaluating Search Engines , 2010 .

[26]  Kenneth Wai-Ting Leung,et al.  Personalized Web search with location preferences , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[27]  Hui Lin,et al.  A Class of Submodular Functions for Document Summarization , 2011, ACL.

[28]  Jiawei Han,et al.  Geographical topic discovery and comparison , 2011, WWW.

[29]  Hongfei Yan,et al.  Optimized top-k processing with global page scores on block-max indexes , 2012, WSDM '12.

[30]  Hanna Wallach,et al.  Structured Topic Models for Language , 2008 .