Semi-supervised Learning with Concept Drift Using Particle Dynamics Applied to Network Intrusion Detection Data

Concept drift, which refers to non stationary learning problems over time, has increasing importance in machine learning and data mining. Many concept drift applications require fast response, which means an algorithm must always be (re)trained with the latest available data. But the process of data labeling is usually expensive and/or time consuming when compared to acquisition of unlabeled data, thus usually only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are based on assumptions that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenging task in machine learning. Recently, a particle competition and cooperation approach has been developed to realize graph-based semi-supervised learning from static data. We have extend that approach to handle data streams and concept drift. The result is a passive algorithm which uses a single classifier approach, naturally adapted to concept changes without any explicit drift detection mechanism. It has built-in mechanisms that provide a natural way of learning from new data, gradually "forgetting" older knowledge as older data items are no longer useful for the classification of newer data items. The proposed algorithm is applied to the KDD Cup 1999 Data of network intrusion, showing its effectiveness.

[1]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Sameep Mehta,et al.  Categorizing Concepts for Detecting Drifts in Stream , 2009, COMAD.

[4]  Steven Abney,et al.  Semisupervised Learning for Computational Linguistics , 2007 .

[5]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[6]  Witold Pedrycz,et al.  Particle Competition and Cooperation in Networks for Semi-Supervised Learning , 2012, IEEE Trans. Knowl. Data Eng..

[7]  Ayhan Demiriz,et al.  Semi-Supervised Clustering Using Genetic Algorithms , 1999 .

[8]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[9]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[10]  L. Rosasco,et al.  Manifold Regularization , 2007 .

[11]  Jiawei Han,et al.  On Appropriate Assumptions to Mine Data Streams: Analysis and Practice , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[12]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[13]  Yong Shi,et al.  A Regularized Multiple Criteria Linear Program for Classification , 2007 .

[14]  Wei Xu,et al.  Modeling concept drift from the perspective of classifiers , 2008, 2008 IEEE Conference on Cybernetics and Intelligent Systems.

[15]  Mikhail Belkin,et al.  Tikhonov regularization and semi-supervised learning on large graphs , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Zhi-Hua Zhou,et al.  Tri-training: exploiting unlabeled data using three classifiers , 2005, IEEE Transactions on Knowledge and Data Engineering.

[17]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[18]  Maria Virvou,et al.  An Intelligent TV-Shopping Application that Provides Recommendations , 2007 .

[19]  Naonori Ueda,et al.  A Hybrid Generative/Discriminative Approach to Semi-Supervised Classifier Design , 2005, AAAI.

[20]  Mykola Pechenizkiy,et al.  Dynamic integration of classifiers for handling concept drift , 2008, Inf. Fusion.

[21]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[22]  Stefan C. Kremer,et al.  Clustering unlabeled data with SOMs improves classification of labeled real-world data , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[23]  Takashi Omori,et al.  ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments , 2005, Multiple Classifier Systems.

[24]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Cluster-based novel concept detection in data streams applied to intrusion detection in computer networks , 2008, SAC '08.

[25]  Zhi-Hua Zhou,et al.  Semisupervised Regression with Cotraining-Style Algorithms , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[27]  Thorsten Joachims,et al.  Transductive Learning via Spectral Graph Partitioning , 2003, ICML.

[28]  J. C. Schlimmer,et al.  Incremental learning from noisy data , 2004, Machine Learning.

[29]  Jan Peter Patist,et al.  Optimal Window Change Detection , 2007, Seventh IEEE International Conference on Data Mining Workshops (ICDMW 2007).

[30]  Koichiro Yamauchi,et al.  Detecting Concept Drift Using Statistical Testing , 2007, Discovery Science.

[31]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[32]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[33]  Indre Zliobaite,et al.  Learning under Concept Drift: an Overview , 2010, ArXiv.

[34]  Fei Wang,et al.  Label Propagation through Linear Neighborhoods , 2006, IEEE Transactions on Knowledge and Data Engineering.

[35]  V. Rao Vemuri,et al.  Adaptive anomaly detection with evolving connectionist systems , 2007, J. Netw. Comput. Appl..

[36]  Gregory Ditzler,et al.  Semi-supervised learning in nonstationary environments , 2011, The 2011 International Joint Conference on Neural Networks.

[37]  Mikhail Belkin,et al.  Regularization and Semi-supervised Learning on Large Graphs , 2004, COLT.

[38]  Ralf Klinkenberg,et al.  Boosting classifiers for drifting concepts , 2007, Intell. Data Anal..

[39]  Marc Boullé,et al.  A supervised approach for change detection in data streams , 2011, The 2011 International Joint Conference on Neural Networks.

[40]  Quanyuan Wu,et al.  Mining Concept-Drifting and Noisy Data Streams Using Ensemble Classifiers , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[41]  Giandomenico Spezzano,et al.  An Adaptive Distributed Ensemble Approach to Mine Concept-Drifting Data Streams , 2007, 19th IEEE International Conference on Tools with Artificial Intelligence(ICTAI 2007).

[42]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[43]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[44]  Avrim Blum,et al.  Learning from Labeled and Unlabeled Data using Graph Mincuts , 2001, ICML.

[45]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Detecting Concept Change in Streaming Data: Overview and Perspectives , 2008 .

[46]  Tom Michael Mitchell,et al.  The Role of Unlabeled Data in Supervised Learning , 2004 .

[47]  Xiangliang Zhang,et al.  Self-adaptive Change Detection in Streaming Data with Non-stationary Distribution , 2010, ADMA.