A Comprehensive Review and Open Challenges of Stream Big Data

Research in big data becomes pioneer in the field of information system. Data stream is well-studied problem in traditional data mining environment, but still needs exploration while dealing with big data. This paper mainly reviewed different research activities, scientific practice, and methods which have been developed for stream big data. In addition, examine well-known real-time platforms which are evolving to handle streaming problem and having existing similarity in terms of usage of main memory and distributed computing technologies for non-real-time data. Finally, summarize open issues and challenges faced by current technologies while acquisition and processing of big data in real time.

[1]  W. M. Wood-Vasey,et al.  SDSS-III: MASSIVE SPECTROSCOPIC SURVEYS OF THE DISTANT UNIVERSE, THE MILKY WAY, AND EXTRA-SOLAR PLANETARY SYSTEMS , 2011, 1101.1529.

[2]  Jie Li,et al.  Rethinking big data: A review on the data quality and usage issues , 2016 .

[3]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[4]  Bogdan Gabrys,et al.  Adaptive Preprocessing for Streaming Data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[5]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[6]  Athanasios V. Vasilakos,et al.  Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data , 2016, IEEE Transactions on Services Computing.

[7]  Piotr Duda,et al.  A New Method for Data Stream Mining Based on the Misclassification Error , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[8]  Peng Yue,et al.  BigGIS: How big data can shape next-generation GIS , 2014, 2014 The Third International Conference on Agro-Geoinformatics.

[9]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[10]  Piotr Duda,et al.  The CART decision tree for mining data streams , 2014, Inf. Sci..

[11]  Gianmarco De Francisci Morales,et al.  Random Forests of Very Fast Decision Trees on GPU for Mining Evolving Big Data Streams , 2014, ECAI.

[12]  C. L. Philip Chen,et al.  Data-intensive applications, challenges, techniques and technologies: A survey on Big Data , 2014, Inf. Sci..

[13]  Md Nasir Sulaiman,et al.  Data stream clustering by divide and conquer approach based on vector model , 2015, Journal of Big Data.

[14]  Unil Yun,et al.  Sliding window based weighted erasable stream pattern mining for stream data applications , 2016, Future Gener. Comput. Syst..

[15]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the McDiarmid's Bound , 2013, IEEE Transactions on Knowledge and Data Engineering.

[16]  Wee Keong Ng,et al.  A survey on data stream clustering and classification , 2015, Knowledge and Information Systems.

[17]  Ck Cheng,et al.  The Age of Big Data , 2015 .

[18]  Yunhao Liu,et al.  Big Data: A Survey , 2014, Mob. Networks Appl..

[19]  Alexandros Labrinidis,et al.  Challenges and Opportunities with Big Data , 2012, Proc. VLDB Endow..

[20]  João Gama,et al.  Distributed Adaptive Model Rules for mining big data streams , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[21]  Leonidas Fegaras,et al.  Incremental Query Processing on Big Data Streams , 2015, IEEE Transactions on Knowledge and Data Engineering.

[22]  Shonali Krishnaswamy,et al.  Mining data streams: a review , 2005, SGMD.

[23]  João Gama,et al.  Adaptive Model Rules From High-Speed Data Streams , 2014, BigMine.

[24]  Xabier Artola,et al.  Big data for Natural Language Processing: A streaming approach , 2015, Knowl. Based Syst..

[25]  Albert-László Barabási,et al.  Understanding individual human mobility patterns , 2008, Nature.

[26]  Piotr Duda,et al.  Decision Trees for Mining Data Streams Based on the Gaussian Approximation , 2014, IEEE Transactions on Knowledge and Data Engineering.

[27]  Wei Fan,et al.  Mining big data: current status, and forecast to the future , 2013, SKDD.

[28]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[29]  Albert Bifet,et al.  Efficient Online Evaluation of Big Data Stream Classifiers , 2015, KDD.