Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop

Network traffic monitoring and analysis is of theoretical and practical significance for optimizing network resource and improving user experience. However, existing solutions, which usually rely on a high-performance server with large storage capacity, are not scalable for detailed analysis of a large volume of traffic data. In this article, we present a traffic monitoring and analysis system for large-scale networks based on Hadoop, an open-source distributed computing platform for big data processing on commodity hardware. This system has been deployed in the core network of a large cellular network and extensively evaluated. The results demonstrate that the system can efficiently processes 4.2 Tbytes of traffic data from 123 Gb/s links with high performance and low cost every day.

[1]  Daniel Gatica-Perez,et al.  Who's Who with Big-Five: Analyzing and Classifying Personality Traits with Smartphones , 2011, 2011 15th Annual International Symposium on Wearable Computers.

[2]  Nirwan Ansari,et al.  Identifying website communities in mobile internet based on affinity measurement , 2014, Comput. Commun..

[3]  Paolo Tonella,et al.  An empirical study on keyword-based Web site clustering , 2004, Proceedings. 12th IEEE International Workshop on Program Comprehension, 2004..

[4]  Dan Gunter,et al.  Scalable analysis of network measurements with Hadoop and Pig , 2012, 2012 IEEE Network Operations and Management Symposium.

[5]  Brigitte Trousse,et al.  Advanced data preprocessing for intersites Web usage mining , 2004, IEEE Intelligent Systems.

[6]  GhemawatSanjay,et al.  The Google file system , 2003 .

[7]  Vinicius Cardoso Garcia,et al.  Evaluating MapReduce for profiling application traffic , 2013, HPPN '13.

[8]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[9]  Youngseok Lee,et al.  Toward scalable internet traffic measurement and analysis with Hadoop , 2013, CCRV.