Solving Nonstationary Classification Problems With Coupled Support Vector Machines

Many learning problems may vary slowly over time: in particular, some critical real-world applications. When facing this problem, it is desirable that the learning method could find the correct input-output function and also detect the change in the concept and adapt to it. We introduce the time-adaptive support vector machine (TA-SVM), which is a new method for generating adaptive classifiers, capable of learning concepts that change with time. The basic idea of TA-SVM is to use a sequence of classifiers, each one appropriate for a small time window but, in contrast to other proposals, learning all the hyperplanes in a global way. We show that the addition of a new term in the cost function of the set of SVMs (that penalizes the diversity between consecutive classifiers) produces a coupling of the sequence that allows TA-SVM to learn as a single adaptive classifier. We evaluate different aspects of the method using appropriate drifting problems. In particular, we analyze the regularizing effect of changing the number of classifiers in the sequence or adapting the strength of the coupling. A comparison with other methods in several problems, including the well-known STAGGER dataset and the real-world electricity pricing domain, shows the good performance of TA-SVM in all tested situations.

[1]  Pablo M. Granitto,et al.  Time-Adaptive Support Vector Machines , 2008, Inteligencia Artif..

[2]  Ivan Koychev,et al.  Tracking Changing User Interests through Prior-Learning of Context , 2002, AH.

[3]  Cesare Alippi,et al.  Just in time classifiers: Managing the slow drift case , 2009, 2009 International Joint Conference on Neural Networks.

[4]  C A Nelson,et al.  Learning to Learn , 2017, Encyclopedia of Machine Learning and Data Mining.

[5]  Ryszard S. Michalski,et al.  Selecting Examples for Partial Memory Learning , 2000, Machine Learning.

[6]  Ivan Koychev,et al.  Tracking Drifting Concepts by Time Window Optimisation , 2005, SGAI Conf..

[7]  Cesare Alippi,et al.  Just-in-time Adaptive Classifiers in Non-Stationary Conditions , 2007, 2007 International Joint Conference on Neural Networks.

[8]  Shai Ben-David,et al.  Learning Changing Concepts by Exploiting the Structure of Change , 1996, COLT '96.

[9]  JefI’rty C. Schlirrlrrer Beyond incremental processing : Tracking concept drift , 1999 .

[10]  J. C. Schlimmer,et al.  Incremental learning from noisy data , 2004, Machine Learning.

[11]  Quanyuan Wu,et al.  Mining Concept-Drifting and Noisy Data Streams Using Ensemble Classifiers , 2009, 2009 International Conference on Artificial Intelligence and Computational Intelligence.

[12]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[13]  Philip M. Long,et al.  Tracking drifting concepts by minimizing disagreements , 2004, Machine Learning.

[14]  Alexey Tsymbal,et al.  The problem of concept drift: definitions and related work , 2004 .

[15]  Sattar Hashemi,et al.  Adapted One-versus-All Decision Trees for Data Stream Classification , 2009, IEEE Transactions on Knowledge and Data Engineering.

[16]  Nestor Caticha,et al.  Statistical Mechanics of Online Learning of Drifting Concepts: A Variational Approach , 2004, Machine Learning.

[17]  William Nick Street,et al.  A streaming ensemble algorithm (SEA) for large-scale classification , 2001, KDD '01.

[18]  Massimiliano Pontil,et al.  Multi-task Learning , 2020, Transfer Learning.

[19]  João Gama,et al.  Adaptation to Drifting Concepts , 2003, EPIA.

[20]  Thorsten Joachims,et al.  Detecting Concept Drift with Support Vector Machines , 2000, ICML.

[21]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[22]  Rich Caruana,et al.  Multitask Learning , 1997, Machine Learning.

[23]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[24]  Mark Herbster,et al.  Tracking the Best Linear Predictor , 2001, J. Mach. Learn. Res..

[25]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[26]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[27]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[28]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[29]  Marcus A. Maloof,et al.  Dynamic weighted majority: a new ensemble method for tracking concept drift , 2003, Third IEEE International Conference on Data Mining.

[30]  Cesare Alippi,et al.  Just-in-Time Adaptive Classifiers—Part I: Detecting Nonstationary Changes , 2008, IEEE Transactions on Neural Networks.

[31]  Wei Fan,et al.  Systematic data selection to mine concept-drifting data streams , 2004, KDD.

[32]  Carsten Lanquillon,et al.  Enhancing Text Classification to Improve Information Filtering , 2001, Künstliche Intell..

[33]  Silvia Ferrari,et al.  A Constrained Optimization Approach to Preserving Prior Knowledge During Incremental Training , 2008, IEEE Transactions on Neural Networks.

[34]  Kenneth O. Stanley Learning Concept Drift with a Committee of Decision Trees , 2003 .

[35]  Philip M. Long,et al.  On the complexity of learning from drifting distributions , 1997, COLT '96.

[36]  Yishay Mansour,et al.  Learning Under Persistent Drift , 1997, EuroCOLT.

[37]  Cesare Alippi,et al.  Just-in-Time Adaptive Classifiers—Part II: Designing the Classifier , 2008, IEEE Transactions on Neural Networks.

[38]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[39]  Mykola Pechenizkiy,et al.  Dynamic integration of classifiers for handling concept drift , 2008, Inf. Fusion.

[40]  Ingrid Renz,et al.  Adaptive Information Filtering: Learning in the Presence of Concept Drifts , 1998 .

[41]  Robi Polikar,et al.  Incremental learning in nonstationary environments with controlled forgetting , 2009, 2009 International Joint Conference on Neural Networks.

[42]  Thorsten Joachims,et al.  Estimating the Generalization Performance of an SVM Efficiently , 2000, ICML.

[43]  Philip S. Yu,et al.  Classifying Data Streams with Skewed Class Distributions and Concept Drifts , 2008, IEEE Internet Computing.

[44]  Marcus A. Maloof,et al.  Paired Learners for Concept Drift , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[45]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[46]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[47]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[48]  Peter L. Bartlett,et al.  Learning with a slowly changing distribution , 1992, COLT '92.

[49]  Marcos Salganicoff,et al.  Tolerating Concept and Sampling Shift in Lazy Learning Using Prediction Error Context Switching , 1997, Artificial Intelligence Review.

[50]  Tom M. Mitchell,et al.  Experience with a learning personal assistant , 1994, CACM.

[51]  Robi Polikar,et al.  Learning concept drift in nonstationary environments using an ensemble of classifiers based approach , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[52]  Robi Polikar,et al.  Learn$^{++}$ .NC: Combining Ensemble of Classifiers With Dynamically Weighted Consult-and-Vote for Efficient Incremental Learning of New Classes , 2009, IEEE Transactions on Neural Networks.

[53]  Ivan Koychev,et al.  Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[54]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[55]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[56]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[57]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..