Concept Neurons - Handling Drift Issues for Real-Time Industrial Data Mining

Learning from data streams is a challenge faced by data science professionals from multiple industries. Most of them struggle hardly on applying traditional Machine Learning algorithms to solve these problems. It happens so due to their high availability on ready-to-use software libraries on big data technologies (e.g. SparkML). Nevertheless, most of them cannot cope with the key characteristics of this type of data such as high arrival rate and/or non-stationary distributions. In this paper, we introduce a generic and yet simplistic framework to fill this gap denominated Concept Neurons. It leverages on a combination of continuous inspection schemas and residual-based updates over the model parameters and/or the model output. Such framework can empower the resistance of most of induction learning algorithms to concept drifts. Two distinct and hence closely related flavors are introduced to handle different drift types. Experimental results on successful distinct applications on different domains along transportation industry are presented to uncover the hidden potential of this methodology.

[1]  Alípio Mário Jorge,et al.  Comparing state-of-the-art regression methods for long term travel time prediction , 2012, Intell. Data Anal..

[2]  Michel Ferreira,et al.  On Predicting the Taxi-Passenger Demand: A Real-Time Approach , 2013, EPIA.

[3]  Yehuda Koren,et al.  Collaborative filtering with temporal dynamics , 2009, KDD.

[4]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[5]  João Gama,et al.  Discretization from data streams: applications to histograms and data mining , 2006, SAC.

[6]  Francesco Alesiani,et al.  Drift3Flow: Freeway-Incident Prediction Using Real-Time Learning , 2015, 2015 IEEE 18th International Conference on Intelligent Transportation Systems.

[7]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[8]  João Mendes-Moreira,et al.  An Incremental Probabilistic Model to Predict Bus Bunching in Real-Time , 2014, IDA.

[9]  Rob J Hyndman,et al.  A state space framework for automatic forecasting using exponential smoothing methods , 2002 .

[10]  João Gama,et al.  Predicting Taxi–Passenger Demand Using Streaming Data , 2013, IEEE Transactions on Intelligent Transportation Systems.

[11]  Vladimiro Miranda,et al.  Very Short-Term Wind Power Forecasting: State-of-the-Art , 2014 .

[12]  Mykola Pechenizkiy,et al.  Beating the baseline prediction in food sales: How intelligent an intelligent predictor is? , 2012, Expert Syst. Appl..

[13]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[14]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[15]  Vladimiro Miranda,et al.  Wind power forecasting : state-of-the-art 2009. , 2009 .

[16]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.