In recent years, we have witnessed an incredible growth in the adoption of smartphones, which has been accompanied by an influx of applications. Users can purchase or download applications for free onto their mobile phones from centralized application markets such as Google’s Android Market and Amazon’s third party market. Despite the rapidly increasing volume of applications available on the markets, these marketplaces often only cursorily review applications, and many applications are unreviewed due to the vast number of submissions. Markets largely rely on user policing and reporting to detect applications that may be misleading in its functionality or misbehaving. This reactive approach is neither scalable nor reliable as the incidence of piracy and malware has increased, putting too much responsibility on end users. To automate the process of identifying problematic applications, we previously proposed Juxtapp, a scalable infrastructure for code similarity analysis among Android applications. Juxtapp is able to find instances of malware, piracy, and vulnerable code by detecting code reuse among applications. Such a system must be scalable and fast, so in this paper we discuss the distributed implementation of Juxtapp. We evaluate Juxtapp’s performance on up to 95,000 Android applications and find that the parallelized system is able to analyze applications rapidly. To aid users in their analysis, we introduce a web service that automatically manages the resources that are required to run distributed Juxtapp, and we evaluate the performance of such a service. For a complementary similarity analysis approach, we propose DStruct, a tool for detecting similar Android applications based on their directory structures. DStruct provides another method for performing similarity analysis to address problems in Android security, including determining if applications are pirated or contain instances of known malware. We evaluate our system using more than 58,000 Android applications from the official Android market and a Chinese third party market. In our experiments, DStruct is able to detect 3 pirated variants of a popular paid game and 9 instances of malicious applications on the Chinese market. Furthermore, on the official market, DStruct detected 4 legitimate applications that malicious authors had used to repackage with malware. We discuss the efficacy of DStruct and provide further insights into improving detection using similarity analysis tools such as ours.
[1]
Debin Gao,et al.
BinHunt: Automatically Finding Semantic Differences in Binary Programs
,
2008,
ICICS.
[2]
David G. Stork,et al.
Pattern Classification
,
1973
.
[3]
Andrew Walenstein,et al.
The Software Similarity Problem in Malware Analysis
,
2006,
Duplication, Redundancy, and Similarity in Software.
[4]
Yuanyuan Zhou,et al.
CP-Miner: finding copy-paste and related bugs in large-scale software code
,
2006,
IEEE Transactions on Software Engineering.
[5]
Shigeo Abe DrEng.
Pattern Classification
,
2001,
Springer London.
[6]
Steve Hanna,et al.
Juxtapp: A Scalable System for Detecting Code Reuse among Android Applications
,
2012,
DIMVA.
[7]
Yajin Zhou,et al.
Detecting repackaged smartphone applications in third-party android marketplaces
,
2012,
CODASPY '12.
[8]
Kilian Q. Weinberger,et al.
Feature hashing for large scale multitask learning
,
2009,
ICML '09.
[9]
Daniel Shawcross Wilkerson,et al.
Winnowing: local algorithms for document fingerprinting
,
2003,
SIGMOD '03.
[10]
Zhendong Su,et al.
Scalable detection of semantic clones
,
2008,
2008 ACM/IEEE 30th International Conference on Software Engineering.
[11]
Kang G. Shin,et al.
Large-scale malware indexing using function-call graphs
,
2009,
CCS.
[12]
Marcus A. Maloof,et al.
Learning to Detect and Classify Malicious Executables in the Wild
,
2006,
J. Mach. Learn. Res..
[13]
Heejung Kim,et al.
MeCC: memory comparison-based clone detector
,
2011,
2011 33rd International Conference on Software Engineering (ICSE).
[14]
David Brumley,et al.
BitShred: feature hashing malware for scalable triage and semantic analysis
,
2011,
CCS '11.
[15]
Christopher Krügel,et al.
Scalable, Behavior-Based Malware Clustering
,
2009,
NDSS.
[16]
Zhendong Su,et al.
DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones
,
2007,
29th International Conference on Software Engineering (ICSE'07).