Empirical comparison of text-based mobile apps similarity measurement techniques

Code-free software similarity detection techniques have been used to support different software engineering tasks, including clustering mobile applications (apps). The way of measuring similarity may affect both the efficiency and quality of clustering solutions. However, there has been no previous comparative study of feature extraction methods used to guide mobile app clustering. In this paper, we investigate different techniques to compute the similarity of apps based on their textual descriptions and evaluate their effectiveness using hierarchical agglomerative clustering. To this end we carry out an empirical study comparing five different techniques, based on topic modelling and keyword feature extraction, to cluster 12,664 apps randomly sampled from the Google Play App Store. The comparison is based on three main criteria: silhouette width measure, human judgement and execution time. The results of our study show that using topic modelling, in addition to collocation-based and dependency-based feature extractors perform similarly in detecting app-feature similarity. However, dependency-based feature extraction performs better than any other in finding application domain similarity (ρ = 0.7,p − value < 0.01). Current categorisation in the app store studied does not exhibit a good classification quality in terms of the claimed feature space. However, a better quality can be achieved using a good feature extraction technique and a traditional clustering method.

[1]  Yuanyuan Zhang,et al.  Feature lifecycles as they spread, migrate, remain, and die in App Stores , 2015, 2015 IEEE 23rd International Requirements Engineering Conference (RE).

[2]  Alessandra Gorla,et al.  Checking app user interfaces against app descriptions , 2016, WAMA@SIGSOFT FSE.

[3]  Collin McMillan,et al.  Recommending source code for use in rapid software prototypes , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[4]  Yuanyuan Zhang,et al.  App Store Analysis: Mining App Stores for Relationships between Customer, Business and Technical Characteristics , 2014 .

[5]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[6]  Jason Nieh,et al.  A measurement study of google play , 2014, SIGMETRICS '14.

[7]  Giacomo Berardi,et al.  Multi-store metadata-based supervised mobile app classification , 2015, SAC.

[8]  Collin McMillan,et al.  Categorizing software applications for maintenance , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[9]  Emitza Guzman,et al.  Which Feature is Unusable? Detecting Usability and User Experience Issues from User Reviews , 2017, 2017 IEEE 25th International Requirements Engineering Conference Workshops (REW).

[10]  Hao Chen,et al.  AnDarwin: Scalable Detection of Semantically Similar Android Applications , 2013, ESORICS.

[11]  Giuliano Antoniol,et al.  The Use of Text Retrieval and Natural Language Processing in Software Engineering , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[12]  Fazli Can,et al.  Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases , 1990, TODS.

[13]  Mario Linares Vásquez,et al.  On automatically detecting similar Android apps , 2016, 2016 IEEE 24th International Conference on Program Comprehension (ICPC).

[14]  Stephen E. Robertson,et al.  Understanding inverse document frequency: on theoretical arguments for IDF , 2004, J. Documentation.

[15]  Harald C. Gall,et al.  ARdoc: app reviews development oriented classifier , 2016, SIGSOFT FSE.

[16]  Mihhail Matskin,et al.  Mining and Analysis of Apps in Google Play , 2013, WEBIST.

[17]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[18]  Yuanyuan Zhang,et al.  Mining App Stores: Extracting Technical, Business and Customer Rating Information for Analysis and Prediction , 2013 .

[19]  S. Cheng,et al.  The influence of online product reviews on the downloading decision for mobile apps , 2015 .

[20]  Earl R. Babbie,et al.  The practice of social research , 1969 .

[21]  Maleknaz Nayebi,et al.  App store mining is not enough for app improvement , 2018, Empirical Software Engineering.

[22]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[23]  Yuanyuan Zhang,et al.  Investigating the relationship between price, rating, and popularity in the Blackberry World App Store , 2017, Inf. Softw. Technol..

[24]  Mark Harman,et al.  App Store Effects on Software Engineering Practices , 2019, IEEE Transactions on Software Engineering.

[25]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[26]  Mathieu Acher,et al.  Feature model extraction from large collections of informal product descriptions , 2013, ESEC/FSE 2013.

[27]  Harald C. Gall,et al.  How can i improve my app? Classifying user reviews for software maintenance and evolution , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[28]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[29]  Jane Cleland-Huang,et al.  On-demand feature recommendations derived from mining public product descriptions , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[30]  Ning Chen,et al.  SimApp: A Framework for Detecting Similar Mobile Applications by Online Kernel Learning , 2015, WSDM.

[31]  Ning Chen,et al.  AR-miner: mining informative reviews for developers from mobile app marketplace , 2014, ICSE.

[32]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[33]  Gabriele Bavota,et al.  Listening to the Crowd for the Release Planning of Mobile Apps , 2019, IEEE Transactions on Software Engineering.

[34]  Yuanyuan Zhang,et al.  Customer Rating Reactions Can Be Predicted Purely using App Features , 2018, 2018 IEEE 26th International Requirements Engineering Conference (RE).

[35]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[36]  Yuanyuan Zhang,et al.  Clustering Mobile Apps Based on Mined Textual Features , 2016, ESEM.

[37]  Peng Liang,et al.  Automatic Classification of Non-Functional Requirements from Augmented App User Reviews , 2017, EASE.

[38]  Yuanyuan Zhang,et al.  App store mining and analysis: MSR for app stores , 2012, 2012 9th IEEE Working Conference on Mining Software Repositories (MSR).

[39]  Tsvi Kuflik,et al.  Functionality-based clustering using short textual description: helping users to find apps installed on their mobile device , 2013, IUI '13.

[40]  Harald C. Gall,et al.  What would users change in my app? summarizing app reviews for recommending software changes , 2016, SIGSOFT FSE.

[41]  Rachel Harrison,et al.  Retrieving and analyzing mobile apps feature requests from online reviews , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[42]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[43]  Alistair Sutcliffe,et al.  Requirements elicitation: Towards the unknown unknowns , 2013, 2013 21st IEEE International Requirements Engineering Conference (RE).

[44]  Josef van Genabith,et al.  Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation , 2008, COLING 2008.

[45]  Jane Cleland-Huang,et al.  Supporting Domain Analysis through Mining and Recommending Features from Online Product Listings , 2013, IEEE Transactions on Software Engineering.

[46]  M. Cugmas,et al.  On comparing partitions , 2015 .

[47]  Alessandra Gorla,et al.  Checking app behavior against app descriptions , 2014, ICSE.

[48]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[49]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[50]  Hans Peter Luhn,et al.  A Statistical Approach to Mechanized Encoding and Searching of Literary Information , 1957, IBM J. Res. Dev..

[51]  Yuanyuan Zhang,et al.  A Survey of App Store Analysis for Software Engineering , 2017, IEEE Transactions on Software Engineering.

[52]  Xiaodong Gu,et al.  "What Parts of Your Apps are Loved by Users?" (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[53]  Chulhyun Kim,et al.  Mobile application service networks: Apple’s App Store , 2014 .

[54]  Yuval Elovici,et al.  Automated Static Code Analysis for Classifying Android Applications Using Machine Learning , 2010, 2010 International Conference on Computational Intelligence and Security.

[55]  Susumu Horiguchi,et al.  Learning to classify short and sparse text & web with hidden topics from large-scale data collections , 2008, WWW.

[56]  William G. J. Halfond,et al.  What Aspects of Mobile Ads Do Users Care About? An Empirical Study of Mobile In-app Ad Reviews , 2017, ArXiv.

[57]  Meiyappan Nagappan,et al.  Future Trends in Software Engineering Research for Mobile Apps , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[58]  CanFazli,et al.  Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases , 1990 .

[59]  Maleknaz Nayebi,et al.  More insight from being more focused: analysis of clustered market apps , 2016, WAMA@SIGSOFT FSE.

[60]  Enhong Chen,et al.  Mobile App Classification with Enriched Contextual Information , 2014, IEEE Transactions on Mobile Computing.

[61]  Prasant Mohapatra,et al.  Early Detection of Spam Mobile Apps , 2015, WWW.

[62]  Yuanyuan Zhang,et al.  Mobile App and App Store Analysis, Testing, and Optimisation , 2016, 2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[63]  Yuanyuan Zhang,et al.  App store mining and analysis , 2015, DeMobile@SIGSOFT FSE.

[64]  Mark Harman,et al.  Causal impact analysis for app releases in google play , 2016, SIGSOFT FSE.

[65]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[66]  Jan vom Brocke,et al.  Enriching iTunes App Store Categories via Topic Modeling , 2014, ICIS.

[67]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[68]  Yuanyuan Zhang,et al.  The App Sampling Problem for App Store Mining , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[69]  C. Spearman The proof and measurement of association between two things. By C. Spearman, 1904. , 1987, The American journal of psychology.

[70]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[71]  Gerardo Canfora,et al.  SURF: Summarizer of User Reviews Feedback , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[72]  Peter J. Bentley,et al.  Investigating Country Differences in Mobile App User Behavior and Challenges for Software Engineering , 2015, IEEE Transactions on Software Engineering.

[73]  Igor Santos,et al.  On the automatic categorisation of android applications , 2012, 2012 IEEE Consumer Communications and Networking Conference (CCNC).

[74]  Mario Linares Vásquez,et al.  Unsupervised Software Categorization Using Bytecode , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[75]  Gang Yin,et al.  Mining Software Profile across Multiple Repositories for Hierarchical Categorization , 2013, 2013 IEEE International Conference on Software Maintenance.

[76]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[77]  J. Bartko The Intraclass Correlation Coefficient as a Measure of Reliability , 1966, Psychological reports.

[78]  Hanna M. Wallach,et al.  Topic modeling: beyond bag-of-words , 2006, ICML.

[79]  Walid Maalej,et al.  How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[80]  Alessandra Gorla,et al.  Detecting Behavior Anomalies in Graphical User Interfaces , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[81]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.