CHAMP: Characterizing Undesired App Behaviors from User Comments Based on Market Policies

Millions of mobile apps have been available through various app markets. Although most app markets have enforced a number of automated or even manual mechanisms to vet each app before it is released to the market, thousands of low-quality apps still exist in different markets, some of which violate the explicitly specified market policies. In order to identify these violations accurately and timely, we resort to user comments, which can form an immediate feedback for app market maintainers, to identify undesired behaviors that violate market policies, including security-related user concerns. Specifically, we present the first large-scale study to detect and characterize the correlations between user comments and market policies. First, we propose CHAMP, an approach that adopts text mining and natural language processing (NLP) techniques to extract semantic rules through a semi-automated process, and classifies comments into 26 pre-defined types of undesired behaviors that violate market policies. Our evaluation on real-world user comments shows that it achieves both high precision and recall (>0.9) in classifying comments for undesired behaviors. Then, we curate a large-scale comment dataset (over 3 million user comments) from apps in Google Play and 8 popular alternative Android app markets, and apply CHAMP to understand the characteristics of undesired behavior comments in the wild. The results confirm our speculation that user comments can be used to pinpoint suspicious apps that violate policies declared by app markets. The study also reveals that policy violations are widespread in many app markets despite their extensive vetting efforts. CHAMP can be a whistle blower that assigns policy-violation scores and identifies most informative comments for apps.

[1]  Jacques Klein,et al.  MadDroid: Characterizing and Detecting Devious Ad Contents for Android Apps , 2020, WWW.

[2]  Li Li,et al.  Why are Android Apps Removed From Google Play? A Large-Scale Empirical Study , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[3]  Lei Cen,et al.  User Comment Analysis for Android apps and CSPI Detection with Comment Expansion , 2014, PIR@SIGIR.

[4]  Sankardas Roy,et al.  Deep Ground Truth Analysis of Current Android Malware , 2017, DIMVA.

[5]  Ali Feizollah,et al.  The Evolution of Android Malware and Android Analysis Techniques , 2017, ACM Comput. Surv..

[6]  Anna Perini,et al.  Finding and Analyzing App Reviews Related to Specific Features: A Research Preview , 2019, REFSQ.

[7]  C. Lattemann,et al.  Huawei , 2021, Management for Professionals.

[8]  Yepang Liu,et al.  Understanding and Detecting Fragmentation-Induced Compatibility Issues for Android Apps , 2020, IEEE Transactions on Software Engineering.

[9]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[10]  Jacques Klein,et al.  FlowDroid: precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android apps , 2014, PLDI.

[11]  Guozhu Meng,et al.  Characterizing Android App Signing Issues , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[12]  Marcos André Gonçalves,et al.  A Feature-Oriented Sentiment Rating for Mobile App Reviews , 2018, WWW.

[13]  Haoyu Wang,et al.  Understanding the Purpose of Permission Use in Mobile Apps , 2017, ACM Trans. Inf. Syst..

[14]  Li Li,et al.  Dating with Scambots: Understanding the Ecosystem of Fraudulent Dating Applications , 2018, IEEE Transactions on Dependable and Secure Computing.

[15]  Zuhair Bandar,et al.  Sentence similarity based on semantic nets and corpus statistics , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ning Chen,et al.  AR-miner: mining informative reviews for developers from mobile app marketplace , 2014, ICSE.

[17]  Tao Xie,et al.  WHYPER: Towards Automating Risk Assessment of Mobile Applications , 2013, USENIX Security Symposium.

[18]  Walid Maalej,et al.  Towards understanding and detecting fake reviews in app stores , 2019, Empirical Software Engineering.

[19]  B. Han,et al.  ViVo , 2020, Proceedings of the 26th Annual International Conference on Mobile Computing and Networking.

[20]  Yanzhao Wu,et al.  CCAligner: A Token Based Large-Gap Clone Detector , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[21]  Gerardo Canfora,et al.  SURF: Summarizer of User Reviews Feedback , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[22]  Walid Maalej,et al.  How Do Users Like This Feature? A Fine Grained Sentiment Analysis of App Reviews , 2014, 2014 IEEE 22nd International Requirements Engineering Conference (RE).

[23]  Hui Xiong,et al.  Mobile app recommendations with security and privacy awareness , 2014, KDD.

[24]  Alessandra Gorla,et al.  Checking app behavior against app descriptions , 2014, ICSE.

[25]  Christos Faloutsos,et al.  Why people hate your app: making sense of user feedback in a mobile app store , 2013, KDD.

[26]  Haoyu Wang,et al.  Identifying and Analyzing the Privacy of Apps for Kids , 2016, HotMobile.

[27]  Tao Zhang,et al.  Can We Trust the Privacy Policies of Android Apps? , 2016, 2016 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[28]  Lei Wu,et al.  Mobile App Squatting , 2020, WWW.

[29]  Xiaodong Gu,et al.  "What Parts of Your Apps are Loved by Users?" (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[30]  Qinghua Zheng,et al.  Graph Embedding Based Familial Analysis of Android Malware using Unsupervised Learning , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[31]  Dietmar Pfahl,et al.  Using app reviews for competitive analysis: tool support , 2019, WAMA@ESEC/SIGSOFT FSE.

[32]  Charu C. Aggarwal,et al.  A Survey of Text Clustering Algorithms , 2012, Mining Text Data.

[33]  Haoyu Wang,et al.  WuKong: a scalable and accurate two-phase approach to Android app clone detection , 2015, ISSTA.

[34]  Erik Derr,et al.  Short Text, Large Effect: Measuring the Impact of User Reviews on Android App Security & Privacy , 2019, 2019 IEEE Symposium on Security and Privacy (SP).

[35]  Yajin Zhou,et al.  Demystifying Diehard Android Apps , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[36]  Xiang Gao,et al.  Repairing Crashes in Android Apps , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[37]  Zhong Chen,et al.  AutoCog: Measuring the Description-to-permission Fidelity in Android Applications , 2014, CCS.

[38]  Yao Guo,et al.  DaPanda: Detecting Aggressive Push Notifications in Android Apps , 2019, 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[39]  Narseo Vallina-Rodriguez,et al.  Beyond Google Play: A Large-Scale Comparative Study of Chinese Android App Markets , 2018, Internet Measurement Conference.

[40]  Bin Liang,et al.  NAR-miner: discovering negative association rules from code for bug detection , 2018, ESEC/SIGSOFT FSE.

[41]  Jacques Klein,et al.  FraudDroid: automated ad fraud detection for Android apps , 2017, ESEC/SIGSOFT FSE.

[42]  Haoyu Wang,et al.  LibRadar: Fast and Accurate Detection of Third-Party Libraries in Android Apps , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[43]  Ying Zou,et al.  Too Many User-Reviews! What Should App Developers Look at First? , 2019, IEEE Transactions on Software Engineering.

[44]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[45]  J. Brodsky A Part of Speech , 1977 .

[46]  Jacques Klein,et al.  CiD: automating the detection of API-related compatibility issues in Android apps , 2018, ISSTA.

[47]  Xiaohui Yan,et al.  A biterm topic model for short texts , 2013, WWW.

[48]  A. Hassan,et al.  What Do Mobile App Users Complain About ? A Study on Free iOS Apps , 2014 .

[49]  Kristina Winbladh,et al.  Analysis of user comments: An approach for software requirements evolution , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[50]  Haoyu Wang,et al.  All your app links are belong to us: understanding the threats of instant apps based attacks , 2020, ESEC/SIGSOFT FSE.

[51]  Michael R. Lyu,et al.  Online App Review Analysis for Identifying Emerging Issues , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[52]  Hao Li,et al.  Understanding the Evolution of Mobile App Ecosystems: A Longitudinal Measurement Study of Google Play , 2019, WWW.

[53]  Tao Xie,et al.  Automated extraction of security policies from natural-language software documents , 2012, SIGSOFT FSE.

[54]  Li Li,et al.  Want to Earn a Few Extra Bucks? A First Look at Money-Making Apps , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[55]  Fengyuan Xu,et al.  DeepIntent: Deep Icon-Behavior Learning for Detecting Intention-Behavior Discrepancy in Mobile Apps , 2019, CCS.

[56]  Ying Chen,et al.  Is this app safe for children?: a comparison study of maturity ratings on Android and iOS applications , 2013, WWW '13.

[57]  Harald C. Gall,et al.  ARdoc: app reviews development oriented classifier , 2016, SIGSOFT FSE.

[58]  Hammad Khalid On identifying user complaints of iOS apps , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[59]  Yuanchun Li,et al.  Mining User Reviews for Mobile App Comparisons , 2017, Proc. ACM Interact. Mob. Wearable Ubiquitous Technol..

[60]  Diana Inkpen,et al.  Semantic text similarity using corpus-based word similarity and string similarity , 2008, ACM Trans. Knowl. Discov. Data.

[61]  Haoyu Wang,et al.  Using text mining to infer the purpose of permission use in mobile apps , 2015, UbiComp.

[62]  Tao Xie,et al.  A Study of Grayware on Google Play , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[63]  Lei Cen,et al.  AUTOREB: Automatically Understanding the Review-to-Behavior Fidelity in Android Applications , 2015, CCS.

[64]  Raymond Y. K. Lau,et al.  Bootstrapping Social Emotion Classification with Semantically Rich Hybrid Neural Networks , 2017, IEEE Transactions on Affective Computing.

[65]  Hao Chen,et al.  Toward Detecting Collusive Ranking Manipulation Attackers in Mobile App Markets , 2017, AsiaCCS.

[66]  Sotiris Ioannidis,et al.  Rage against the virtual machine: hindering dynamic analysis of Android malware , 2014, EuroSec '14.

[67]  Yan Zhang,et al.  User Based Aggregation for Biterm Topic Model , 2015, ACL.

[68]  Tung Thanh Nguyen,et al.  Phrase-based extraction of user opinions in mobile app reviews , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[69]  Gabriele Bavota,et al.  User reviews matter! Tracking crowdsourced reviews to support evolution of successful apps , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[70]  Alessandra Gorla,et al.  Mining Apps for Abnormal Usage of Sensitive Data , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[71]  Hao Li,et al.  RmvDroid: Towards A Reliable Android Malware Dataset with App Metadata , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[72]  Gabriele Bavota,et al.  Release Planning of Mobile Apps Based on User Reviews , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[73]  Keqing He,et al.  A Web Service Discovery Approach Based on Common Topic Groups Extraction , 2017, IEEE Access.

[74]  Tao Xie,et al.  Inferring method specifications from natural language API descriptions , 2012, 2012 34th International Conference on Software Engineering (ICSE).