Big-DAMA: Big Data Analytics for Network Traffic Monitoring and Analysis

The complexity of the Internet has dramatically increased in the last few years, making it more important and challenging to design scalable Network Traffic Monitoring and Analysis (NTMA) applications and tools. Critical NTMA applications such as the detection of anomalies, network attacks and intrusions, require fast mechanisms for online analysis of thousands of events per second, as well as efficient techniques for offline analysis of massive historical data. We are witnessing a major development in Big Data Analysis Frameworks (BDAFs), but the application of BDAFs and scalable analysis techniques to the NTMA domain remains poorly understood and only in-house and difficult to benchmark solutions are conceived. In this position paper we describe the basis of the Big-DAMA research project, which aims at tackling this growing need by benchmarking and developing novel scalable techniques and frameworks capable to analyze both online network traffic data streams and offline massive traffic datasets.

[1]  Prashant J. Shenoy,et al.  SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce , 2012, TODS.

[2]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[3]  Marco Mellia,et al.  Large-scale network traffic monitoring with DBStream, a system for rolling big data analysis , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[4]  Youngseok Lee,et al.  Toward scalable internet traffic measurement and analysis with Hadoop , 2013, CCRV.

[5]  Lu Liu,et al.  Muppet: MapReduce-Style Processing of Fast Data , 2012, Proc. VLDB Endow..

[6]  Michael Stonebraker,et al.  SQL databases v. NoSQL databases , 2010, CACM.

[7]  Kensuke Fukuda,et al.  Hashdoop: A MapReduce framework for network anomaly detection , 2014, 2014 IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS).

[8]  Tom White,et al.  Hadoop: The Definitive Guide , 2009 .

[9]  Giovane C. M. Moura,et al.  ENTRADA: A high-performance network traffic data streaming warehouse , 2016, NOMS 2016 - 2016 IEEE/IFIP Network Operations and Management Symposium.

[10]  Theodore Johnson,et al.  Gigascope: a stream database for network applications , 2003, SIGMOD '03.

[11]  Sudipto Guha,et al.  Clustering data streams , 2000, Proceedings 41st Annual Symposium on Foundations of Computer Science.

[12]  Theodore Johnson,et al.  Stream warehousing with DataDepot , 2009, SIGMOD Conference.

[13]  Qiang Chen,et al.  Aurora : a new model and architecture for data stream management ) , 2006 .

[14]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[15]  Pramod Bhatotia,et al.  Incoop: MapReduce for incremental computations , 2011, SoCC.

[16]  Scott Shenker,et al.  Discretized Streams: An Efficient and Fault-Tolerant Model for Stream Processing on Large Clusters , 2012, HotCloud.

[17]  Feng Liu,et al.  Monitoring and analyzing big traffic data of a large-scale cellular network with Hadoop , 2014, IEEE Network.

[18]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Data stream clustering: A survey , 2013, CSUR.

[19]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Albert Bifet,et al.  DATA STREAM MINING A Practical Approach , 2009 .

[21]  References , 1971 .

[22]  Michael Hahsler,et al.  Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R , 2017 .