SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic

We present SAMPLES: Self Adaptive Mining of Persistent LExical Snippets; a systematic framework for classifying network traffic generated by mobile applications. SAMPLES constructs conjunctive rules, in an automated fashion, through a supervised methodology over a set of labeled flows (the training set). Each conjunctive rule corresponds to the lexical context, associated with an application identifier found in a snippet of the HTTP header, and is defined by: (a) the identifier type, (b) the HTTP header-field it occurs in, and (c) the prefix/suffix surrounding its occurrence. Subsequently, these conjunctive rules undergo an aggregate-and-validate step for improving accuracy and determining a priority order. The refined rule-set is then loaded into an application-identification engine where it operates at a per flow granularity, in an extract-and-lookup paradigm, to identify the application responsible for a given flow. Thus, SAMPLES can facilitate important network measurement and management tasks --- e.g. behavioral profiling [29], application-level firewalls [21,22] etc. --- which require a more detailed view of the underlying traffic than that afforded by traditional protocol/port based methods. We evaluate SAMPLES on a test set comprising 15 million flows (approx.) generated by over 700 K applications from the Android, iOS and Nokia market-places. SAMPLES successfully identifies over 90% of these applications with 99% accuracy on an average. This, in spite of the fact that fewer than 2% of the applications are required during the training phase, for each of the three market places. This is a testament to the universality and the scalability of our approach. We, therefore, expect SAMPLES to work with reasonable coverage and accuracy for other mobile platforms --- e.g. BlackBerry and Windows Mobile --- as well.

[1]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[2]  Anja Feldmann,et al.  A First Look at Mobile Hand-Held Device Traffic , 2010, PAM.

[3]  Xuxian Jiang,et al.  Unsafe exposure analysis of mobile in-app advertisements , 2012, WISEC '12.

[4]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[5]  Qiang Xu,et al.  FLOWR: a self-learning system for classifying mobileapplication traffic , 2014, SIGMETRICS '14.

[6]  Narseo Vallina-Rodriguez,et al.  Breaking for commercials: characterizing mobile advertising , 2012, Internet Measurement Conference.

[7]  Suman Banerjee,et al.  Capturing mobile experience in the wild: a tale of two apps , 2013, CoNEXT.

[8]  Dawn Xiaodong Song,et al.  Understanding Mobile App Usage Patterns Using In-App Advertisements , 2013, PAM.

[9]  Andrzej Duda,et al.  Markov chain fingerprinting to classify encrypted traffic , 2014, IEEE INFOCOM 2014 - IEEE Conference on Computer Communications.

[10]  David A. Wagner,et al.  AdDroid: privilege separation for applications and advertisers in Android , 2012, ASIACCS '12.

[11]  Cecilia Mascolo,et al.  Don't kill my ads!: balancing privacy in an ad-supported mobile application market , 2012, HotMobile '12.

[12]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[13]  Qiang Xu,et al.  Identifying diverse usage behaviors of smartphone apps , 2011, IMC '11.

[14]  Fulvio Risso,et al.  Per-user policy enforcement on mobile apps through network functions virtualization , 2014, MobiArch '14.

[15]  Deborah Estrin,et al.  A first look at traffic on smartphones , 2010, IMC '10.

[16]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.

[17]  Qiang Xu,et al.  Automatic generation of mobile app signatures from traffic observations , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).

[18]  Dawn Xiaodong Song,et al.  NetworkProfiler: Towards automatic fingerprinting of Android apps , 2013, 2013 Proceedings IEEE INFOCOM.

[19]  Michalis Faloutsos,et al.  ProfileDroid: multi-layer profiling of android applications , 2012, Mobicom '12.

[20]  S. Begum,et al.  Sequence Alignment , 2018, Beginners Guide to Bioinformatics for High Throughput Sequencing.

[21]  Jason Nieh,et al.  A measurement study of google play , 2014, SIGMETRICS '14.

[22]  Boundary detection in tokenizing network application payload for anomaly detection , 2003 .

[23]  Fulvio Risso,et al.  MAPPER: A Mobile Application Personal Policy Enforcement Router for Enterprise Networks , 2014, 2014 Third European Workshop on Software Defined Networks.

[24]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.