Automatic generation of mobile app signatures from traffic observations

There are network management, traffic engineering, and security practices adopted in today's networking that rely on the knowledge about what applications' traffic is passing through the networks. These practices might fail with mobile apps whose identity remains hidden in generic HTTP traffic. The main reason is that unlike traditional applications, most mobile apps do not use specific protocols or IP ports with distinctive features. Many enterprises and service providers are in a great need of regaining control over their networks that increasingly carry mobile traffic. In this paper we propose FLOWR, a system that automatically identifies mobile apps by continually learning the apps' distinguishing features via traffic analysis. FLOWR focuses solely on key-value pairs in HTTP headers and intelligently identifies the pairs suitable for app signatures. Our system employs a custom supervised learning approach that leverages a very limited knowledge of app-signature seeds and autonomously grows its capacity for app identification. The approach is motivated by a simple but effective hypothesis that unknown app-identifying features should co-occur with the known signatures. Our experimental results show a significant growth in flow identification coverage provided by FLOWR. Specifically, we show that FLOWR can achieve identification of 86-95% of flows related to their generating apps.

[1]  Dawn Xiaodong Song,et al.  Understanding Mobile App Usage Patterns Using In-App Advertisements , 2013, PAM.

[2]  Feng Qian,et al.  Profiling resource usage for mobile applications: a cross-layer approach , 2011, MobiSys '11.

[3]  Balachander Krishnamurthy,et al.  For sale : your data: by : you , 2011, HotNets-X.

[4]  Byung-Gon Chun,et al.  TaintDroid: An Information-Flow Tracking System for Realtime Privacy Monitoring on Smartphones , 2010, OSDI.

[5]  Balachander Krishnamurthy,et al.  On the leakage of personally identifiable information via online social networks , 2009, CCRV.

[6]  Collin Mulliner Privacy leaks in mobile phone internet access , 2010, 2010 14th International Conference on Intelligence in Next Generation Networks.

[7]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[8]  Balachander Krishnamurthy,et al.  Privacy leakage vs . Protection measures : the growing disconnect , 2011 .

[9]  Deborah Estrin,et al.  A first look at traffic on smartphones , 2010, IMC '10.

[10]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[11]  2015 IEEE Conference on Computer Communications, INFOCOM 2015, Kowloon, Hong Kong, April 26 - May 1, 2015 , 2015, IEEE Conference on Computer Communications.

[12]  Zhenkai Liang,et al.  Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation , 2007, USENIX Security Symposium.

[13]  Dawn Xiaodong Song,et al.  NetworkProfiler: Towards automatic fingerprinting of Android apps , 2013, 2013 Proceedings IEEE INFOCOM.

[14]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[15]  Clayton Shepard,et al.  LiveLab: measuring wireless networks and smartphone users in the field , 2011, SIGMETRICS Perform. Evaluation Rev..

[16]  Anja Feldmann,et al.  A First Look at Mobile Hand-Held Device Traffic , 2010, PAM.

[17]  Helen J. Wang,et al.  Discoverer: Automatic Protocol Reverse Engineering from Network Traces , 2007, USENIX Security Symposium.

[18]  Deborah Estrin,et al.  Diversity in smartphone usage , 2010, MobiSys '10.

[19]  Roy T. Fielding,et al.  Uniform Resource Identifier (URI): Generic Syntax , 2005, RFC.

[20]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[21]  Vern Paxson,et al.  Semi-automated discovery of application session structure , 2006, IMC '06.

[22]  Aleksandar Kuzmanovic,et al.  Measuring serendipity: connecting people, locations and interests in a mobile 3G network , 2009, IMC '09.

[23]  Qiang Xu,et al.  Identifying diverse usage behaviors of smartphone apps , 2011, IMC '11.

[24]  Michalis Faloutsos,et al.  ProfileDroid: multi-layer profiling of android applications , 2012, Mobicom '12.

[25]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[26]  David A. Wagner,et al.  Analyzing inter-application communication in Android , 2011, MobiSys '11.

[27]  Lixin Gao,et al.  Profiling users in a 3g network using hourglass co-clustering , 2010, MobiCom.