Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps

The rapid adoption of mobile messaging Apps has enabled us to collect massive amount of encrypted Internet traffic of mobile messaging. The classification of this traffic into different types of in-App service usages can help for intelligent network management, such as managing network bandwidth budget and providing quality of services. Traditional approaches for classification of Internet traffic rely on packet inspection, such as parsing HTTP headers. However, messaging Apps are increasingly using secure protocols, such as HTTPS and SSL, to transmit data. This imposes significant challenges on the performances of service usage classification by packet inspection. To this end, in this paper, we investigate how to exploit encrypted Internet traffic for classifying in-App usages. Specifically, we develop a system, named CUMMA, for classifying service usages of mobile messaging Apps by jointly modeling user behavioral patterns, network traffic characteristics, and temporal dependencies. Along this line, we first segment Internet traffic from traffic-flows into sessions with a number of dialogs in a hierarchical way. Also, we extract the discriminative features of traffic data from two perspectives: (i) packet length and (ii) time delay. Next, we learn a service usage predictor to classify these segmented dialogs into single-type usages or outliers. In addition, we design a clustering Hidden Markov Model (HMM) based method to detect mixed dialogs from outliers and decompose mixed dialogs into sub-dialogs of single-type usage. Indeed, CUMMA enables mobile analysts to identify service usages and analyze end-user in-App behaviors even for encrypted Internet traffic. Finally, the extensive experiments on real-world messaging data demonstrate the effectiveness and efficiency of the proposed method for service usage classification.

[1]  Deborah Estrin,et al.  Diversity in smartphone usage , 2010, MobiSys '10.

[2]  Dimitrios Gunopulos,et al.  Streaming Time Series Summarization Using User-Defined Amnesic Functions , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Jun Zhang,et al.  Network Traffic Classification Using Correlation Information , 2013, IEEE Transactions on Parallel and Distributed Systems.

[4]  Michalis Faloutsos,et al.  BLINC: multilevel traffic classification in the dark , 2005, SIGCOMM '05.

[5]  Hagit Shatkay,et al.  Approximate queries and representations for large data sequences , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[6]  Heikki Mannila,et al.  Time series segmentation for context recognition in mobile devices , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[7]  Sebastian Zander,et al.  Automated traffic classification and application identification using machine learning , 2005, The IEEE Conference on Local Computer Networks 30th Anniversary (LCN'05)l.

[8]  George Varghese,et al.  Network monitoring using traffic dispersion graphs (tdgs) , 2007, IMC '07.

[9]  Oliver Spatscheck,et al.  Accurate, scalable in-network identification of p2p traffic using application signatures , 2004, WWW '04.

[10]  Bhavik R. Bakshi,et al.  Representation of process trends—IV. Induction of real-time patterns from operating data for diagnosis and supervisory control , 1994 .

[11]  Michalis Faloutsos,et al.  Internet traffic classification demystified: myths, caveats, and the best practices , 2008, CoNEXT '08.

[12]  Anirban Mahanti,et al.  Traffic classification using clustering algorithms , 2006, MineNet '06.

[13]  Srinivasan Seshan,et al.  A quest for an Internet video quality-of-experience metric , 2012, HotNets-XI.

[14]  Sebastian Zander,et al.  A preliminary performance comparison of five machine learning algorithms for practical IP traffic flow classification , 2006, CCRV.

[15]  Qiang Xu,et al.  Identifying diverse usage behaviors of smartphone apps , 2011, IMC '11.

[16]  Riyad Alshammari,et al.  Machine learning based encrypted traffic classification: Identifying SSH and Skype , 2009, 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications.

[17]  Renata Teixeira,et al.  Traffic classification on the fly , 2006, CCRV.

[18]  Thomas Ristenpart,et al.  Peek-a-Boo, I Still See You: Why Efficient Traffic Analysis Countermeasures Fail , 2012, 2012 IEEE Symposium on Security and Privacy.

[19]  Jun Zhang,et al.  An Effective Network Traffic Classification Method with Unknown Flow Detection , 2013, IEEE Transactions on Network and Service Management.

[20]  Andrew W. Moore,et al.  Internet traffic classification using bayesian analysis techniques , 2005, SIGMETRICS '05.

[21]  Patrick Haffner,et al.  ACAS: automated construction of application signatures , 2005, MineNet '05.

[22]  Anthony McGregor,et al.  Flow Clustering Using Machine Learning Techniques , 2004, PAM.

[23]  Philip S. Yu,et al.  MALM: a framework for mining sequence database at multiple abstraction levels , 1998, CIKM '98.

[24]  Shuen-Lin Jeng,et al.  Time Series Classification Based on Spectral Analysis , 2007, Commun. Stat. Simul. Comput..

[25]  Dawn Xiaodong Song,et al.  Understanding Mobile App Usage Patterns Using In-App Advertisements , 2013, PAM.

[26]  Ivan Martinovic,et al.  Who do you sync you are?: smartphone fingerprinting via application behaviour , 2013, WiSec '13.

[27]  Michalis Faloutsos,et al.  Transport layer identification of P2P traffic , 2004, IMC '04.

[28]  Hannu Toivonen,et al.  Estimating the number of segments in time series data using permutation tests , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[29]  Eamonn J. Keogh,et al.  An Enhanced Representation of Time Series Which Allows Fast and Accurate Classification, Clustering and Relevance Feedback , 1998, KDD.

[30]  Luca Salgarelli,et al.  Support Vector Machines for TCP traffic classification , 2009, Comput. Networks.

[31]  Scott E. Coull,et al.  Traffic Analysis of Encrypted Messaging Services: Apple iMessage and Beyond , 2014, CCRV.

[32]  János Abonyi,et al.  Fuzzy Clustering Based Segmentation of Time-Series , 2003, IDA.

[33]  Nino Vincenzo Verde,et al.  Can't You Hear Me Knocking: Identification of User Actions on Android Apps via Traffic Analysis , 2014, CODASPY.

[34]  Andrew W. Moore,et al.  Bayesian Neural Networks for Internet Traffic Classification , 2007, IEEE Transactions on Neural Networks.

[35]  Feng Qian,et al.  Profiling resource usage for mobile applications: a cross-layer approach , 2011, MobiSys '11.

[36]  Sang Pil Han,et al.  An Empirical Analysis of User Content Generation and Usage Behavior on the Mobile Internet , 2011, Manag. Sci..