Applying machine learning to big data streams : An overview of challenges

The importance of processing stream data increases with new technologies and new use cases. Applying machine learning to stream data and process them in real time leads to challenges in different ways. Model changes, concept drift or insufficient time to train models are a few examples. We illustrate big data characteristics and machine learning techniques derived from literature and conclude with available approaches and drawbacks.

[1]  Xue-wen Chen,et al.  Big Data Deep Learning: Challenges and Perspectives , 2014, IEEE Access.

[2]  Athanasios V. Vasilakos,et al.  Machine learning on big data: Opportunities and challenges , 2017, Neurocomputing.

[3]  Roger H. L. Chiang,et al.  Big Data Research in Information Systems: Toward an Inclusive Research Agenda , 2016, J. Assoc. Inf. Syst..

[4]  Shan Suthaharan,et al.  Big data classification: problems and challenges in network intrusion prediction with machine learning , 2014, PERV.

[5]  Qihui Wu,et al.  A survey of machine learning for big data processing , 2016, EURASIP Journal on Advances in Signal Processing.

[6]  Neoklis Polyzotis,et al.  Data Management Challenges in Production Machine Learning , 2017, SIGMOD Conference.

[7]  Jean Paul Barddal,et al.  A Survey on Ensemble Learning for Data Stream Classification , 2017, ACM Comput. Surv..

[8]  Nada Lavrac,et al.  Stream-based active learning for sentiment analysis in the financial domain , 2014, Inf. Sci..

[9]  Murat Dundar,et al.  Learning Classifiers When the Training Data Is Not IID , 2007, IJCAI.

[10]  Jignesh M. Patel,et al.  Big data and its technical challenges , 2014, CACM.

[11]  Jesse Read,et al.  Data Stream Classification Using Random Feature Functions and Novel Method Combinations , 2015, TrustCom/BigDataSE/ISPA.

[12]  Barbara Hammer,et al.  Incremental learning algorithms and applications , 2016, ESANN.

[13]  Han Liu,et al.  Challenges of Big Data Analysis. , 2013, National science review.

[14]  Konstantinos Tserpes,et al.  Employing traditional machine learning algorithms for big data streams analysis: The case of object trajectory prediction , 2016, J. Syst. Softw..

[15]  Taghi M. Khoshgoftaar,et al.  Deep learning applications and challenges in big data analytics , 2015, Journal of Big Data.

[16]  Francisco Herrera,et al.  A survey on data preprocessing for data stream mining: Current status and future directions , 2017, Neurocomputing.

[17]  Junhong Wang,et al.  Dynamic extreme learning machine for data stream classification , 2017, Neurocomputing.

[18]  Saso Dzeroski,et al.  Multi-label classification via multi-target regression on data streams , 2016, Machine Learning.

[19]  Miriam A. M. Capretz,et al.  Machine Learning With Big Data: Challenges and Approaches , 2017, IEEE Access.

[20]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[21]  Zhi-Hua Zhou,et al.  Classification Under Streaming Emerging New Classes: A Solution Using Completely-Random Trees , 2016, IEEE Transactions on Knowledge and Data Engineering.

[22]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[23]  Martin Apitz,et al.  Method for Intra-Surgical Phase Detection by Using Real-Time Medical Device Data , 2017, 2017 IEEE 30th International Symposium on Computer-Based Medical Systems (CBMS).

[24]  João Gama,et al.  On evaluating stream learning algorithms , 2012, Machine Learning.