Hadoop based Deep Packet Inspection system for traffic analysis of e-business websites

Internet traffic is experiencing an explosive growth, and online shopping is one of the significant drivers. However, alert network operators, unwilling to be dumb pipes, are making every effort to mine mass traffic with the help of Deep Packet Inspection (DPI) which is regarded as a big challenge especially for massive data when traditional methods and programming model are utilized. Hadoop provides an alternative approach with its strength in distributed storage and parallel computing. In this paper, a Hadoop based DPI system was reported, which was integrated with a web crawler. The system architecture and MapReduce models of packet analysis, web URL restoration were presented. As an example, live web traffic visiting the Tmall, the leading e-shopping giant in China, was specifically investigated using this system. Popularity of product, category and brand for a certain period was evaluated from page views of product. The detailed information of products was provided by the product information base built by the web crawler. This work explored the methodology of using Hadoop in DPI and presented valuable guidelines to develop such a system, which can be further used in analyzing other services and mining the value of network traffic by network operators.

[1]  Yong Wang,et al.  ISP-Enabled Behavioral Ad Targeting without Deep Packet Inspection , 2010, 2010 Proceedings IEEE INFOCOM.

[2]  Li Qing-chuan Packet domain monitoring system based on cloud storage , 2012 .

[3]  Nen-Fu Huang,et al.  On the cloud-based network traffic classification and applications identification service , 2012, 2012 International Conference on Selected Topics in Mobile and Wireless Networking.

[4]  T. Yamamoto,et al.  High-Speed DPI Method Using Multi-Stage Packet Flow Analyses , 2012, 2012 9th Asia-Pacific Symposium on Information and Telecommunication Technologies (APSITT).

[5]  Youngseok Lee,et al.  Toward scalable internet traffic measurement and analysis with Hadoop , 2013, CCRV.

[6]  Antonio Pescapè,et al.  Issues and future directions in traffic classification , 2012, IEEE Network.

[7]  Grenville J. Armitage,et al.  A survey of techniques for internet traffic classification using machine learning , 2008, IEEE Communications Surveys & Tutorials.

[8]  Vinicius Cardoso Garcia,et al.  Measuring Distributed Applications through MapReduce and Traffic Analysis , 2012, 2012 IEEE 18th International Conference on Parallel and Distributed Systems.

[9]  Dan Gunter,et al.  Scalable analysis of network measurements with Hadoop and Pig , 2012, 2012 IEEE Network Operations and Management Symposium.