Datasets of Android Applications: a Literature Review

Mobile phones and tablets have become the most widely used computing devices, with a large predominance of the Android platform. As a natural evolution, the development of Android applications has surged and has become a major field of study, with research efforts ranging from energy efficiency, to code smells, performance, maintainability, security, etc. These kind of challenges ask for dedicated solutions, tools, and datasets. This survey identifies and reviews 31 existing datasets of Android applications and classifies each of them according to key features, such as the total number of apps it contains, whether the commit history of the apps is available, whether it focusses on the source code or on the executable binaries of the apps, the sources used for building the dataset, etc. This study can benefit both the experienced and the novice researcher interested on doing research on Android apps, which can use the results of our study as a map for identifying the most suitable datasets for their research objectives.

[1]  Evangelos P. Markatos,et al.  Rise of the planet of the apps: a systematic study of the mobile app ecosystem , 2013, Internet Measurement Conference.

[2]  Yang Liu,et al.  AndroVault: Constructing Knowledge Graph from Millions of Android Apps for Automated Analysis , 2017, ArXiv.

[3]  Alessandra Gorla,et al.  Checking app behavior against app descriptions , 2014, ICSE.

[4]  Jared Smith,et al.  A Dataset of Open-Source Android Applications , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[5]  Mohamed Wiem Mkaouer,et al.  Who Added That Permission to My App? An Analysis of Developer Permission Changes in Open Source Android Apps , 2017, 2017 IEEE/ACM 4th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[6]  Ahmed E. Hassan,et al.  Fresh apps: an empirical study of frequently-updated mobile apps in the Google play store , 2015, Empirical Software Engineering.

[7]  Jacques Klein,et al.  AndroZoo: Collecting Millions of Android Apps for the Research Community , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[8]  Michele Lanza,et al.  Software Analytics for Mobile Applications--Insights & Lessons Learned , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[9]  Jacques Klein,et al.  Are Your Training Datasets Yet Relevant? - An Investigation into the Importance of Timeline in Machine Learning-Based Malware Detection , 2015, ESSoS.

[10]  David Lo,et al.  How Android App Developers Manage Power Consumption? - An Empirical Study by Mining Power Management Commits , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[11]  Mile Stojkovski Thresholds for Software Quality Metrics in Open Source Android Projects , 2017 .

[12]  Heng Yin,et al.  DroidAPIMiner: Mining API-Level Features for Robust Malware Detection in Android , 2013, SecureComm.

[13]  Ivano Malavolta,et al.  Hybrid Mobile Apps in the Google Play Store: An Exploratory Investigation , 2015, 2015 2nd ACM International Conference on Mobile Software Engineering and Systems.

[14]  Ashish Sureka,et al.  Pravaaha: Mining Android Applications for Discovering API Call Usage Patterns and Trends , 2015, ISEC.

[15]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[16]  Li Li,et al.  Mining AndroZoo: A Retrospect , 2017, 2017 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[17]  Maleknaz Nayebi,et al.  Analysis of marketed versus not-marketed mobile app releases , 2016 .

[18]  Christopher Vendome,et al.  How developers detect and fix performance bottlenecks in Android apps , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[19]  Yanick Fratantonio,et al.  ANDRUBIS -- 1,000,000 Apps Later: A View on Current Android Malware Behaviors , 2014, 2014 Third International Workshop on Building Analysis Datasets and Gathering Experience Returns for Security (BADGERS).

[20]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.

[21]  Ilenia Fronza,et al.  Better Code for Better Apps: A Study on Source Code Quality and Market Success of Android Applications , 2015, 2015 2nd ACM International Conference on Mobile Software Engineering and Systems.

[22]  Gabriele Bavota,et al.  Mining energy-greedy API usage patterns in Android apps: an empirical study , 2014, MSR 2014.

[23]  Jacques Klein,et al.  On the Lack of Consensus in Anti-Virus Decisions: Metrics and Insights on Building Ground Truths of Android Malware , 2016, DIMVA.

[24]  John C. S. Lui,et al.  Droid Analytics: A Signature Based Analytic System to Collect, Extract, Analyze and Associate Android Malware , 2013, 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications.

[25]  Gerardo Canfora,et al.  Android apps and user feedback: a dataset for software evolution and quality improvement , 2017, WAMA@ESEC/SIGSOFT FSE.

[26]  Yuanyuan Zhang,et al.  A Survey of App Store Analysis for Software Engineering , 2017, IEEE Transactions on Software Engineering.

[27]  Jacques Klein,et al.  Empirical assessment of machine learning-based malware detectors for Android , 2014, Empirical Software Engineering.

[28]  Jacques Klein,et al.  A Forensic Analysis of Android Malware -- How is Malware Written and How it Could Be Detected? , 2014, 2014 IEEE 38th Annual Computer Software and Applications Conference.

[29]  Felix C. Freiling,et al.  An Empirical Evaluation of Software Obfuscation Techniques Applied to Android APKs , 2014, SecureComm.

[30]  Massimiliano Di Penta,et al.  A Quantitative and Qualitative Investigation of Performance-Related Commits in Android Apps , 2016, 2016 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[31]  Fabio Palomba,et al.  A Graph-Based Dataset of Commit History of Real-World Android apps , 2018, 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR).

[32]  Veelasha Moonsamy,et al.  Mining permission patterns for contrasting clean and malicious android applications , 2014, Future Gener. Comput. Syst..

[33]  Alessandra Gorla,et al.  CALAPPA: a toolchain for mining Android applications , 2016, WAMA@SIGSOFT FSE.

[34]  Jacques Klein,et al.  IccTA: Detecting Inter-Component Privacy Leaks in Android Apps , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[35]  Romain Rouvoy,et al.  Tracking the Software Quality of Android Applications Along Their Evolution (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).