Efficient mining of high-speed uncertain data streams

Currently available algorithms for data streams classification are mostly designed to deal with precise and complete data. However, data in many real-life applications is naturally uncertain due to inherent instrument inaccuracy, wireless transmission error, and so on. We propose UELM-MapReduce, a parallel ensemble classifier based on Extreme Learning Machine (ELM) and MapReduce for handling uncertain data streams. We train an efficient parallel ELM-based ensemble classifier from sequential training chunks of the uncertain data streams. The weight of each base classifier in the ensemble is adjusted according to its mean square error on the up-to-date test chunk, and the classifier with the lowest accuracy is replaced. UELM-MapReduce can classify uncertain data streams with both efficiency and accuracy while effectively handling concept drift. Experimental results demonstrate that UELM-MapReduce has better performance than other methods in prediction accuracy and computational efficiency.

[1]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[2]  Jing Liu,et al.  Ambiguous decision trees for mining concept-drifting data streams , 2009, Pattern Recognit. Lett..

[3]  Sunil Prabhakar,et al.  Querying imprecise data in moving object environments , 2003, Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405).

[4]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[5]  김용수,et al.  Extreme Learning Machine 기반 퍼지 패턴 분류기 설계 , 2015 .

[6]  Victor C. M. Leung,et al.  Extreme Learning Machines [Trends & Controversies] , 2013, IEEE Intelligent Systems.

[7]  Yuni Xia,et al.  UNN: A Neural Network for Uncertain Data Classification , 2010, PAKDD.

[8]  Ilyes Jenhani,et al.  Decision trees as possibilistic classifiers , 2008, Int. J. Approx. Reason..

[9]  Jianyong Wang,et al.  Direct mining of discriminative patterns for classifying uncertain data , 2010, KDD.

[10]  Yang Zhang,et al.  Decision Tree for Dynamic and Uncertain Data Streams , 2010, ACML.

[11]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[12]  João Gama,et al.  Handling Time Changing Data with Adaptive Very Fast Decision Rules , 2012, ECML/PKDD.

[13]  Philip S. Yu,et al.  One-Class-Based Uncertain Data Stream Learning , 2011, SDM.

[14]  Chee Kheong Siew,et al.  Extreme learning machine: Theory and applications , 2006, Neurocomputing.

[15]  Chen Zhang,et al.  Tracking High Quality Clusters over Uncertain Data Streams , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[16]  Jinbo Bi,et al.  Support Vector Classification with Input Data Uncertainty , 2004, NIPS.

[17]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[18]  Yang Chang,et al.  A Framework for Classifying Uncertain and Evolving Data Streams , 2011 .

[19]  Sunil Prabhakar,et al.  A Rule-Based Classification Algorithm for Uncertain Data , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[20]  Sunil Prabhakar,et al.  Evaluating probabilistic queries over imprecise data , 2003, SIGMOD '03.

[21]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[22]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[23]  Biao Qin,et al.  DTU: A Decision Tree for Uncertain Data , 2009, PAKDD.

[24]  Indrjeet Rajput,et al.  Stream Data Mining : A Survey , 2013 .

[25]  Charu C. Aggarwal,et al.  Data Streams: Models and Algorithms (Advances in Database Systems) , 2006 .

[26]  Xue Li,et al.  Classifier Ensemble for Uncertain Data Stream Classification , 2010, PAKDD.

[27]  Hongming Zhou,et al.  Extreme Learning Machines [Trends & Controversies] , 2013 .

[28]  João Gama,et al.  Learning Decision Rules from Data Streams , 2011, IJCAI.

[29]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[30]  Charu C. Aggarwal,et al.  Data Streams - Models and Algorithms , 2014, Advances in Database Systems.

[31]  Huajun Chen,et al.  ELM-MapReduce: MapReduce accelerated extreme learning machine for big spatial data analysis , 2013, 2013 10th IEEE International Conference on Control and Automation (ICCA).

[32]  Sau Dan Lee,et al.  Decision Trees for Uncertain Data , 2011, IEEE Transactions on Knowledge and Data Engineering.

[33]  Peng Shi,et al.  Learning very fast decision tree from uncertain data streams with positive and unlabeled samples , 2012, Inf. Sci..

[34]  Xin Zhang,et al.  Classification of Uncertain Data Streams Based on Extreme Learning Machine , 2014, Cognitive Computation.

[35]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.