Ungrafting Malicious Code from Piggybacked Android Apps

To devise e cient approaches and tools for detecting malicious code in the Android ecosystem, researchers are increasingly required to have deep understanding of malware. There is thus a need to provide a framework for dissecting malware and localizing malicious program fragments within app code in order to build a comprehensive dataset of malicious samples. In this paper we address this need with an approach for listing malicious packages in an app based on code graph analysis. To that end we focus on piggybacked apps, which are benign apps repackaged with malicious payload. Our approach classifies each app independently from its potential clones based on machine learning, and detects piggybacked apps with a precision of about 97%. With the built classifier we were also able to find new piggybacked apps in market datasets, outside our ground truth. We also identify malicious packages with an accuracy@5 of 83% and an accuracy@1 of around 68%. We further demonstrate the importance of collecting malicious packages by using them to build a performant malware detection system.

[1]  Chanchal K. Roy,et al.  A Survey on Software Clone Detection Research , 2007 .

[2]  Hao Chen,et al.  Attack of the Clones: Detecting Cloned Applications on Android Markets , 2012, ESORICS.

[3]  Jacques Klein,et al.  Potential Component Leaks in Android Apps: An Investigation into a New Feature Set for Malware Detection , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[4]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[5]  Gerardo Canfora,et al.  A Classifier of Malicious Android Applications , 2013, 2013 International Conference on Availability, Reliability and Security.

[6]  David Brumley,et al.  BitShred: feature hashing malware for scalable triage and semantic analysis , 2011, CCS '11.

[7]  Duncan J. Watts,et al.  Collective dynamics of ‘small-world’ networks , 1998, Nature.

[8]  Lei Zhang,et al.  Towards a scalable resource-driven approach for detecting repackaged Android applications , 2014, ACSAC.

[9]  Konrad Rieck,et al.  DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket , 2014, NDSS.

[10]  Wenke Lee,et al.  McBoost: Boosting Scalability in Malware Collection and Analysis Using Statistical Classification of Executables , 2008, 2008 Annual Computer Security Applications Conference (ACSAC).

[11]  Peng Liu,et al.  Achieving accuracy and scalability simultaneously in detecting application clones on Android markets , 2014, ICSE.

[12]  Jianping Yin,et al.  Malicious Codes Detection Based on Ensemble Learning , 2007, ATC.

[13]  Anthony Desnos,et al.  Android: Static Analysis Using Similarity Distance , 2012, 2012 45th Hawaii International Conference on System Sciences.

[14]  Marcus A. Maloof,et al.  Learning to Detect and Classify Malicious Executables in the Wild , 2006, J. Mach. Learn. Res..

[15]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[16]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[17]  Hao Chen,et al.  AnDarwin: Scalable Detection of Android Application Clones Based on Semantics , 2015, IEEE Transactions on Mobile Computing.

[18]  Hahn-Ming Lee,et al.  DroidMat: Android Malware Detection through Manifest and API Calls Tracing , 2012, 2012 Seventh Asia Joint Conference on Information Security.

[19]  Jun Zhang,et al.  Clonewise - Detecting Package-Level Clones Using Machine Learning , 2013, SecureComm.

[20]  Jacques Klein,et al.  An Investigation into the Use of Common Libraries in Android Apps , 2015, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[21]  Sakir Sezer,et al.  A New Android Malware Detection Approach Using Bayesian Classification , 2013, 2013 IEEE 27th International Conference on Advanced Information Networking and Applications (AINA).

[22]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[23]  Kang G. Shin,et al.  Large-scale malware indexing using function-call graphs , 2009, CCS.

[24]  Jules White,et al.  Applying machine learning classifiers to dynamic Android malware detection at scale , 2013, 2013 9th International Wireless Communications and Mobile Computing Conference (IWCMC).

[25]  Hossain Shahriar,et al.  Detection of repackaged Android Malware , 2014, The 9th International Conference for Internet Technology and Secured Transactions (ICITST-2014).

[26]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[27]  Olga Gadyatskaya,et al.  FSquaDRA: Fast Detection of Repackaged Applications , 2014, DBSec.

[28]  Latifur Khan,et al.  A Machine Learning Approach to Android Malware Detection , 2012, 2012 European Intelligence and Security Informatics Conference.

[29]  Arun Lakhotia,et al.  DroidLegacy: Automated Familial Classification of Android Malware , 2014, PPREW'14.

[30]  Yajin Zhou,et al.  Detecting repackaged smartphone applications in third-party android marketplaces , 2012, CODASPY '12.

[31]  Jacques Klein,et al.  Parameter Values of Android APIs: A Preliminary Study on 100,000 Apps , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[32]  Jason Nieh,et al.  A measurement study of google play , 2014, SIGMETRICS '14.

[33]  Jacques Klein,et al.  IccTA: Detecting Inter-Component Privacy Leaks in Android Apps , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[34]  Yajin Zhou,et al.  Fast, scalable detection of "Piggybacked" mobile applications , 2013, CODASPY.

[35]  Jacques Klein,et al.  Automatically Exploiting Potential Component Leaks in Android Applications , 2014, 2014 IEEE 13th International Conference on Trust, Security and Privacy in Computing and Communications.

[36]  Stan Jarzabek,et al.  A Data Mining Approach for Detecting Higher-Level Clones in Software , 2009, IEEE Transactions on Software Engineering.

[37]  Yang Xiang,et al.  Classification of malware using structured control flow , 2010 .

[38]  Brenda S. Baker,et al.  On finding duplication and near-duplication in large software systems , 1995, Proceedings of 2nd Working Conference on Reverse Engineering.

[39]  Salvatore J. Stolfo,et al.  On the feasibility of online malware detection with performance counters , 2013, ISCA.

[40]  Yajin Zhou,et al.  Dissecting Android Malware: Characterization and Evolution , 2012, 2012 IEEE Symposium on Security and Privacy.