Machine Learning based Video Hosting Site Identification Method for MVNO Networks

Zero-rating service provided by Mobile Virtual Network Operators (MVNOs) has been attracting smartphone users who frequently watch web videos that are delivered by heavily bandwidth-consuming applications. With the increase of encrypted traffic, MVNOs need to identify video hosting sites accessed by smartphone users via encrypted traffic analysis for enabling such services. If traffic from permitted sites is identified as coming from non-permitted sites due to mistaken identification of video hosting sites, unreasonable payments are inevitable for MVNOs or subscribers, and vice versa. In this paper we propose two feature sets considering multiple flow transmission and analyze the feature sets by supervised machine learning for identifying video hosting sites accurately. The first set is traffic features extracted from flows for only transmitting video contents. The second is 4-tuple distribution of established flows for transmitting various contents in a single video web page. These feature sets are based on our investigation of the characteristics of four of the most popular video hosting sites in Japan. Through video hosting site identification experiments, the identification accuracy of single flow analysis reaches 85.9%, and the accuracy of the proposed method reaches 92.0%.

[1]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[2]  Hiroshi Yoshida,et al.  Log analysis in a HTTP proxy server for accurately estimating web QoE , 2018, 2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC).

[3]  Zoraida Frias,et al.  5G networks: Will technology and policy collide? , 2017, Telecommunications Policy.

[4]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[5]  Hua Wu,et al.  Towards QoE assessment of encrypted YouTube adaptive video streaming in mobile networks , 2016, 2016 IEEE/ACM 24th International Symposium on Quality of Service (IWQoS).

[6]  Riyad Alshammari,et al.  Can encrypted traffic be identified without port numbers, IP addresses and payload inspection? , 2011, Comput. Networks.

[7]  Eric Rescorla,et al.  The Transport Layer Security (TLS) Protocol Version 1.3 , 2018, RFC.

[8]  Mauro Conti,et al.  Robust Smartphone App Identification via Encrypted Network Traffic Analysis , 2017, IEEE Transactions on Information Forensics and Security.

[9]  Carey L. Williamson,et al.  Identifying and discriminating between web and peer-to-peer traffic in the network core , 2007, WWW '07.

[10]  Masayuki Murata,et al.  Users' reaction to network quality during web browsing on smartphones , 2014, 2014 26th International Teletraffic Congress (ITC).