Can Android Applications Be Identified Using Only TCP/IP Headers of Their Launch Time Traffic?

The ability to identify mobile apps in network traffic has significant implications in many domains, including traffic management, malware detection, and maintaining user privacy. App identification methods in the literature typically use deep packet inspection (DPI) and analyze HTTP headers to extract app fingerprints. However, these methods cannot be used if HTTP traffic is encrypted. We investigate whether Android apps can be identified from their launch-time network traffic using only TCP/IP headers. We first capture network traffic of 86,109 app launches by repeatedly running 1,595 apps on 4 distinct Android devices. We then use supervised learning methods used previously in the web page identification literature, to identify the apps that generated the traffic. We find that: (i) popular Android apps can be identified with 88% accuracy, by using the packet sizes of the first 64 packets they generate, when the learning methods are trained and tested on the data collected from same device; (ii) when the data from an unseen device (but similar operating system/vendor) is used for testing, the apps can be identified with 67% accuracy; (iii) the app identification accuracy does not drop significantly even if the training data are stale by several days, and (iv) the accuracy does drop quite significantly if the operating system/vendor is very different. We discuss the implications of our findings as well as open issues.

[1]  Dirk Grunwald,et al.  Legal issues surrounding monitoring during network research , 2007, IMC '07.

[2]  Hannes Federrath,et al.  Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier , 2009, CCSW '09.

[3]  Ivan Martinovic,et al.  Who do you sync you are?: smartphone fingerprinting via application behaviour , 2013, WiSec '13.

[4]  Tao Jin,et al.  Application-awareness in SDN , 2013, SIGCOMM.

[5]  Yajin Zhou,et al.  Detecting repackaged smartphone applications in third-party android marketplaces , 2012, CODASPY '12.

[6]  Minas Gjoka,et al.  AntMonitor: A System for Monitoring from Mobile Devices , 2015, C2BD@SIGCOMM.

[7]  Lili Qiu,et al.  Statistical identification of encrypted Web browsing traffic , 2002, Proceedings 2002 IEEE Symposium on Security and Privacy.

[8]  Yanghee Choi,et al.  Internet traffic classification demystified: on the sources of the discriminative power , 2010, CoNEXT.

[9]  Thomas Ristenpart,et al.  Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail , 2012, 2012 IEEE Symposium on Security and Privacy.

[10]  Yong Liao,et al.  AppPrint: Automatic Fingerprinting of Mobile Applications in Network Traffic , 2015, PAM.

[11]  Yong Liao,et al.  SAMPLES: Self Adaptive Mining of Persistent LExical Snippets for Classifying Mobile Application Traffic , 2015, MobiCom.

[12]  Dawn Xiaodong Song,et al.  NetworkProfiler: Towards automatic fingerprinting of Android apps , 2013, 2013 Proceedings IEEE INFOCOM.

[13]  Judith Kelner,et al.  A Survey on Internet Traffic Identification , 2009, IEEE Communications Surveys & Tutorials.

[14]  Nino Vincenzo Verde,et al.  Analyzing Android Encrypted Network Traffic to Identify User Actions , 2016, IEEE Transactions on Information Forensics and Security.

[15]  Brian Neil Levine,et al.  Inferring the source of encrypted HTTP connections , 2006, CCS '06.

[16]  Ling Huang,et al.  I Know Why You Went to the Clinic: Risks and Realization of HTTPS Traffic Analysis , 2014, Privacy Enhancing Technologies.

[17]  Phillip A. Porras,et al.  Clear and Present Data: Opaque Traffic and its Security Implications for the Future , 2013, NDSS.

[18]  Rui Wang,et al.  Side-Channel Leaks in Web Applications: A Reality Today, a Challenge Tomorrow , 2010, 2010 IEEE Symposium on Security and Privacy.

[19]  Qiang Xu,et al.  Automatic generation of mobile app signatures from traffic observations , 2015, 2015 IEEE Conference on Computer Communications (INFOCOM).