MODS: Multiple One-class Data Streams Learning from Homogeneous Data

This paper presents a novel approach, called MODS, to build an accurate time evolving classifier from multiple one-class data streams learning time evolving classifier. Our proposed MODS approach works in two steps. In the first step, we first construct local one-class classifiers on the labeled positive examples from each sub-data stream respectively. We then collect the informative examples (support vectors) around each local one-class classifier, which can support the decision boundary of the classifier. This is called support vector preservation principle. In the second step, we construct a global one-class classifier on the collected informative examples. By using the support vector preservation principle, our proposed MODS explicitly addresses the problem of building accurate classifier from multiple one-class data streams. Extensive experiments on real life data streams have demonstrated that our MODS approach can achieve high performance and efficiency for the multiple one-class data streams learning in comparison with other approaches.

[1]  Xindong Wu,et al.  Sequential pattern mining in multiple streams , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[2]  Edward Y. Chang,et al.  Using one-class and two-class SVMs for multiclass image annotation , 2005, IEEE Transactions on Knowledge and Data Engineering.

[3]  Philip S. Yu,et al.  Vote-Based LELC for Positive and Unlabeled Textual Data Streams , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[4]  Philip S. Yu,et al.  On dynamic data-driven selection of sensor streams , 2011, KDD.

[5]  Jiawei Han,et al.  PEBL: Web page classification without negative examples , 2004, IEEE Transactions on Knowledge and Data Engineering.

[6]  Xindong Wu,et al.  Vague One-Class Learning for Data Streams , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[7]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[8]  Paul R. Cohen,et al.  Searching for Structure in Multiple Streams of Data , 1996, ICML.

[9]  William M. Shaw,et al.  On the foundation of evaluation , 1986, J. Am. Soc. Inf. Sci..

[10]  Philip S. Yu,et al.  One-class learning and concept summarization for data streams , 2011, Knowledge and Information Systems.

[11]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[12]  Philip S. Yu,et al.  One-Class-Based Uncertain Data Stream Learning , 2011, SDM.

[13]  W. M. Shaw On the Foundation of Evaluation. , 1986 .

[14]  Takafumi Kanamori,et al.  Statistical outlier detection using direct density ratio estimation , 2011, Knowledge and Information Systems.

[15]  Philip S. Yu,et al.  A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions , 2007, SDM.

[16]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[17]  Fabio Crestani,et al.  Discovering Significant Patterns in Multi-stream Sequences , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[18]  Weng-Keen Wong,et al.  Use of multiple data streams to conduct Bayesian biologic surveillance. , 2005, MMWR supplements.

[19]  Chandan Srivastava,et al.  Support Vector Data Description , 2011 .

[20]  Philip S. Yu,et al.  Positive Unlabeled Learning for Data Stream Classification , 2009, SDM.

[21]  Xue Li,et al.  OcVFDT: one-class very fast decision tree for one-class classification of data streams , 2009, SensorKDD '09.

[22]  Philip S. Yu,et al.  A Framework for Clustering Uncertain Data Streams , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[23]  Qi Zhang,et al.  Incremental Subspace Clustering over Multiple Data Streams , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[24]  Philip S. Yu,et al.  Text classification without negative examples revisit , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Maria E. Orlowska,et al.  One-Class Classification of Text Streams with Concept Drift , 2008, 2008 IEEE International Conference on Data Mining Workshops.

[26]  Robert Gwadera,et al.  Multi-stream join answering for mining significant cross-stream correlations , 2010, 2010 IEEE International Conference on Data Mining.