Recent Trends in Distributed Online Stream Processing Platform for Big Data: Survey

There is no doubt that big data has become an important source of information and knowledge, especially for large profitability companies such as Facebook and Amazon. But, dealing with this kind of data comes with great difficulties; thus, several techniques have been used to analyze them. Many techniques handle big data and give decisions based on off-line batch analysis. Today, we need to make a constructive decision based on online streaming data analysis. Stream computing is gaining interest because it provides the opportunity for real-time data analytics. The objective of this paper is to give an in-depth analysis of efficient big data streaming analysis platforms, as well as to provide solutions for some on-line big data processing problems. Additionally, some recent works in big data streaming were highlighted.

[1]  Sergio Ramírez-Gallego,et al.  Nearest Neighbor Classification for High-Speed Big Data Streams Using Spark , 2017, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[2]  Dilpreet Singh,et al.  A survey on platforms for big data analytics , 2014, Journal of Big Data.

[3]  Fei Su,et al.  A Survey on Big Data Analytics Technologies , 2017, 5GWN.

[4]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[5]  Bu-Sung Lee,et al.  Fair Resource Allocation for Data-Intensive Computing in the Cloud , 2018, IEEE Transactions on Services Computing.

[6]  Konstantinos Tserpes,et al.  Employing traditional machine learning algorithms for big data streams analysis: The case of object trajectory prediction , 2016, J. Syst. Softw..

[7]  Xiaomin Zhu,et al.  SP-Partitioner: A novel partition method to handle intermediate data skew in spark streaming , 2017, Future Gener. Comput. Syst..

[8]  Jesse Read,et al.  Data Stream Classification Using Random Feature Functions and Novel Method Combinations , 2015, TrustCom/BigDataSE/ISPA.

[9]  John Murphy,et al.  Investigation of Replication Factor for Performance Enhancement in the Hadoop Distributed File System , 2018, ICPE Companion.

[10]  Luc Bougé,et al.  A performance evaluation of Apache Kafka in support of big data streaming applications , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[11]  Taghi M. Khoshgoftaar,et al.  A survey of open source tools for machine learning with big data in the Hadoop ecosystem , 2015, Journal of Big Data.

[12]  Marco Aldinucci,et al.  PiCo: A Novel Approach to Stream Data Analytics , 2017, Euro-Par Workshops.

[13]  Nasseh Tabrizi,et al.  Developing a Real-Time Data Analytics Framework for Twitter Streaming Data , 2017, 2017 IEEE International Congress on Big Data (BigData Congress).

[14]  Ayoub Ait Lahcen,et al.  Big Data technologies: A survey , 2017, J. King Saud Univ. Comput. Inf. Sci..

[15]  Li Xu,et al.  Online Internet traffic monitoring system using spark streaming , 2018, Big Data Min. Anal..

[16]  Walisa Romsaiyud Automatic extraction of topics on big data streams through scalable advanced analysis , 2014, 2014 International Computer Science and Engineering Conference (ICSEC).

[17]  Jian Tang,et al.  Performance Modeling and Predictive Scheduling for Distributed Stream Data Processing , 2016, IEEE Transactions on Big Data.

[18]  Nikolay L. Kazanskiy,et al.  Performance analysis of real-time face detection system based on stream data mining frameworks , 2017 .

[19]  Annie Ibrahim Rana,et al.  Anomaly Detection Guidelines for Data Streams in Big Data , 2016, 2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI).

[20]  Nishant Garg Learning Apache Kafka - Second Edition , 2015 .