Learning on High Frequency Stock Market Data Using Misclassified Instances in Ensemble

Learning on non-stationary distribution has been shown to be a very challenging problem in machine learning and data mining, because the joint probability distribution between the data and classes changes over time. Many real time problems suffer concept drift as they changes with time. For example, in stock market, the customer’s behavior may change depending on the season of the year and on the inflation. Concept drift can occurs in the stock market for a number of reasons for example, trader’s preference for stocks change over time, increases in a stock’s value may be followed by decreases. The objective of this paper is to develop an ensemble based classification algorithm for non-stationary data stream which would consider misclassified instances during learning process. In addition, we are presenting here an exhaustive comparison of proposed algorithms with state-of-the-art classification approaches using different evaluation measures like recall, f-measure and g-mean.

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Meenakshi A.Thalor,et al.  Review of Ensemble Based Classification Algorithms for Nonstationary and Imbalanced Data , 2014 .

[4]  Robi Polikar,et al.  Learning concept drift in nonstationary environments using an ensemble of classifiers based approach , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[5]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[6]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[7]  Geoff Holmes,et al.  Batch-Incremental versus Instance-Incremental Learning in Dynamic and Evolving Data , 2012, IDA.

[8]  Robi Polikar,et al.  Incremental Learning of Concept Drift in Nonstationary Environments , 2011, IEEE Transactions on Neural Networks.

[9]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[10]  R. Polikar,et al.  Multiple Classifiers Based Incremental Learning Algorithm for Learning in Nonstationary Environments , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[11]  Robi Polikar,et al.  An Ensemble Approach for Incremental Learning in Nonstationary Environments , 2007, MCS.

[12]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[13]  Robi Polikar,et al.  Incremental Learning of Variable Rate Concept Drift , 2009, MCS.

[14]  Jerzy Stefanowski,et al.  Accuracy Updated Ensemble for Data Streams with Concept Drift , 2011, HAIS.