Semantic Search of Mobile Applications Using Word Embeddings

This paper proposes a set of approaches for the semantic search of mobile applications, based on their name and on the unstructured textual information contained in their description. The proposed approaches make use of word-level, character-level, and contextual word-embeddings that have been trained or fine-tuned using a dataset of about 500 thousand mobile apps, collected in the scope of this work. The proposed approaches have been evaluated using a public dataset that includes information about 43 thousand applications, and 56 manually annotated non-exact queries. Our results show that both character-level embeddings trained on our data, and fine-tuned RoBERTa models surpass the performance of the other existing retrieval strategies reported in the literature. 2012 ACM Subject Classification Information systems → Retrieval models and ranking; Information systems → Document representation; Information systems → Language models; Information systems → Search engine indexing; Information systems → Similarity measures; Computing methodologies → Machine learning

[1]  Yi Fang,et al.  Mobile App Retrieval for Social Media Users via Inference of Implicit Intent in Social Media Text , 2016, CIKM.

[2]  Haohong Wang,et al.  Leveraging User Reviews to Improve Accuracy for Mobile App Retrieval , 2015, SIGIR.

[3]  Sangaralingam Kajanan,et al.  A Mobile App Search Engine , 2013, Mob. Networks Appl..

[4]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[5]  Nargis Pervin,et al.  Mobilewalla: A Mobile Application Search Engine , 2011, MobiCASE.

[6]  Ricardo Ribeiro,et al.  Using Topic Information to Improve Non-exact Keyword-Based Search for Mobile Applications , 2020, IPMU.

[7]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[8]  Rémi Louf,et al.  Transformers : State-ofthe-art Natural Language Processing , 2019 .

[9]  Jimmy J. Lin,et al.  Pretrained Transformers for Text Ranking: BERT and Beyond , 2020, NAACL.

[10]  David M. Blei,et al.  Probabilistic topic models , 2012, Commun. ACM.

[11]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[12]  Shanika Karunasekera,et al.  Search Result Personalization in Twitter Using Neural Word Embeddings , 2017, DaWaK.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Benoît Favre,et al.  Word Embedding Evaluation and Combination , 2016, LREC.

[15]  Hua Wu,et al.  RocketQA: An Optimized Training Approach to Dense Passage Retrieval for Open-Domain Question Answering , 2020, NAACL.

[16]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[17]  Long Jin,et al.  Semantic Matching in APP Search , 2015, WSDM.

[18]  Ji-Rong Wen,et al.  Employing Personal Word Embeddings for Personalized Search , 2020, SIGIR.

[19]  M. Zaharia,et al.  ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT , 2020, SIGIR.