A Systematic Assessment on Android Third-party Library Detection Tools

Third-party libraries (TPLs) have become a significant part of the Android ecosystem. Developers can employ various TPLs to facilitate their app development. Unfortunately, the popularity of TPLs also brings new security issues. For example, TPLs may carry malicious or vulnerable code, which can infect popular apps to pose threats to mobile users. Furthermore, TPL detection is essential for downstream tasks, such as vulnerabilities and malware detection. Thus, various tools have been developed to identify TPLs. However, no existing work has studied these TPL detection tools in detail, and different tools focus on different applications and techniques with performance differences. A comprehensive understanding of these tools will help us make better use of them. To this end, we conduct a comprehensive empirical study to fill the gap by evaluating and comparing all publicly available TPL detection tools based on six criteria: accuracy of TPL construction, effectiveness, efficiency, accuracy of version identification, resiliency to code obfuscation, and ease of use. Besides, we enhance these open-source tools by fixing their limitations, to improve their detection ability. Finally, we build an extensible framework that integrates all existing available TPL detection tools, providing an online service for the research community. We release the evaluation dataset and enhanced tools. According to our study, we also present the essential findings and discuss promising implications to the community; e.g., 1) Most existing TPL detection techniques more or less depend on package structure to construct in-app TPL candidates. However, using package structure as the module decoupling feature is error-prone. We hence suggest future researchers using the class dependency to substitute package structure. 2) Extracted features include richer semantic information (e.g., class dependencies) can achieve better resiliency to code obfuscation. 3) Existing tools usually have a low recall; that is because previous tools ignore some features of Android apps and TPLs, such as the compilation mechanism, the new format of TPLs, TPL dependency. Most existing tools cannot effectively find partial import TPLs, obfuscated TPLs, which directly limit their capability. 4) Existing tools are complementary to each other; we can build a better tool via combining the advantages of each tool. We believe our work provides a clear picture of existing TPL detection techniques and also gives a road-map for future research.

[1]  Yang Liu,et al.  Efficiently Manifesting Asynchronous Programming Errors in Android Apps , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[2]  Yang Liu,et al.  Large-Scale Analysis of Framework-Specific Exceptions in Android Apps , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[3]  Claes Wohlin,et al.  Guidelines for snowballing in systematic literature studies and a replication in software engineering , 2014, EASE '14.

[4]  Li Li,et al.  Automated Third-Party Library Detection for Android Applications: Are We There Yet? , 2020, 2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[5]  Junwei Tang,et al.  Identify and Inspect Libraries in Android Applications , 2018, Wirel. Pers. Commun..

[6]  Zicheng Zhang,et al.  An empirical study of potentially malicious third-party libraries in Android apps , 2020, WISEC.

[7]  Yajin Zhou,et al.  Fast, scalable detection of "Piggybacked" mobile applications , 2013, CODASPY.

[8]  Tao Zhang,et al.  A Comparative Study of Android Repackaged Apps Detection Techniques , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[9]  Jacques Klein,et al.  An Investigation into the Use of Common Libraries in Android Apps , 2015, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[10]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[11]  Muhammet Baykara,et al.  A review of cloned mobile malware applications for android devices , 2018, 2018 6th International Symposium on Digital Forensic and Security (ISDFS).

[12]  Erik Derr,et al.  Reliable Third-Party Library Detection in Android and its Security Applications , 2016, CCS.

[13]  Hongxia Jin,et al.  Efficient Privilege De-Escalation for Ad Libraries in Mobile Apps , 2015, MobiSys.

[14]  Lei Zhang,et al.  Towards a scalable resource-driven approach for detecting repackaged Android applications , 2014, ACSAC.

[15]  Jacques Klein,et al.  Rebooting Research on Detecting Repackaged Android Apps: Literature Review and Benchmark , 2018, IEEE Transactions on Software Engineering.

[16]  Philippe Suter,et al.  Identifying Android Library Dependencies in the Presence of Code Obfuscation and Minimization , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[17]  Jacques Klein,et al.  Automated Testing of Android Apps: A Systematic Literature Review , 2019, IEEE Transactions on Reliability.

[18]  Yajin Zhou,et al.  Detecting repackaged smartphone applications in third-party android marketplaces , 2012, CODASPY '12.

[19]  Ping Luo,et al.  LibDX: A Cross-Platform and Accurate System to Detect Third-Party Libraries in Binary Code , 2020, 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[20]  Xia Zeng,et al.  Automated test input generation for Android: are we really there yet in an industrial case? , 2016, SIGSOFT FSE.

[21]  Yan Wang,et al.  Orlis: Obfuscation-Resilient Library Detection for Android , 2018, 2018 IEEE/ACM 5th International Conference on Mobile Software Engineering and Systems (MOBILESoft).

[22]  Jian Liu,et al.  Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild , 2018, SecureComm.

[23]  Sencun Zhu,et al.  ViewDroid: towards obfuscation-resilient mobile application repackaging detection , 2014, WiSec '14.

[24]  Haoyu Wang,et al.  Understanding Third-Party Libraries in Mobile App Analysis , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering Companion (ICSE-C).

[25]  Lingling Fan,et al.  Why My App Crashes? Understanding and Benchmarking Framework-Specific Exceptions of Android Apps , 2022, IEEE Transactions on Software Engineering.

[26]  Alessandra Gorla,et al.  Automated Test Input Generation for Android: Are We There Yet? (E) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[27]  Jian Xu,et al.  LibRoad: Rapid, Online, and Accurate Detection of TPLs on Android , 2022, IEEE Transactions on Mobile Computing.

[28]  Karl Trygve Kalleberg,et al.  Finding software license violations through binary code clone detection , 2011, MSR '11.

[29]  Haoyu Wang,et al.  WuKong: a scalable and accurate two-phase approach to Android app clone detection , 2015, ISSTA.

[30]  Annamalai Narayanan,et al.  AdDetect: Automated detection of Android ad libraries using semantic analysis , 2014, 2014 IEEE Ninth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP).

[31]  Mario Vento,et al.  A (sub)graph isomorphism algorithm for matching large graphs , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Annamalai Narayanan,et al.  LibSift: Automated Detection of Third-Party Libraries in Android Applications , 2016, 2016 23rd Asia-Pacific Software Engineering Conference (APSEC).

[33]  Pei Wang,et al.  Large-Scale Third-Party Library Detection in Android Markets , 2020, IEEE Transactions on Software Engineering.

[34]  Peng Liu,et al.  Achieving accuracy and scalability simultaneously in detecting application clones on Android markets , 2014, ICSE.

[35]  Peng Wang,et al.  Finding Unknown Malice in 10 Seconds: Mass Vetting for New Threats at the Google-Play Scale , 2015, USENIX Security Symposium.

[36]  Steve Hanna,et al.  Juxtapp: A Scalable System for Detecting Code Reuse among Android Applications , 2012, DIMVA.

[37]  Xin Sun,et al.  Detecting Code Reuse in Android Applications Using Component-Based Control Flow Graph , 2014, SEC.

[38]  Воробьев Антон Александрович Анализ уязвимостей вычислительных систем на основе алгебраических структур и потоков данных National Vulnerability Database , 2013 .

[39]  Haoyu Wang,et al.  LibRadar: Fast and Accurate Detection of Third-Party Libraries in Android Apps , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C).

[40]  Jian Liu,et al.  LibD: Scalable and Precise Third-Party Library Detection in Android Markets , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[41]  Donald B. Johnson,et al.  Finding All the Elementary Circuits of a Directed Graph , 1975, SIAM J. Comput..

[42]  Wenke Lee,et al.  Identifying Open-Source License Violation and 1-day Security Risk at Large Scale , 2017, CCS.

[43]  Norman M. Sadeh,et al.  Modeling Users' Mobile App Privacy Preferences: Restoring Usability in a Sea of Permission Settings , 2014, SOUPS.