Minimax Classification under Concept Drift with Multidimensional Adaptation and Performance Guarantees

The statistical characteristics of instance-label pairs often change with time in practical scenarios of supervised classification. Conventional learning techniques adapt to such concept drift accounting for a scalar rate of change by means of a carefully chosen learning rate, forgetting factor, or window size. However, the time changes in common scenarios are multidimensional, i.e., different statistical characteristics often change in a different manner. This paper presents adaptive minimax risk classifiers (AMRCs) that account for multidimensional time changes by means of a multivariate and high-order tracking of the time-varying underlying distribution. In addition, differently from conventional techniques, AMRCs can provide computable tight performance guarantees. Experiments on multiple benchmark datasets show the classification improvement of AMRCs compared to the stateof-the-art and the reliability of the presented performance guarantees.

[1]  P. Grünwald,et al.  Minimax risk classifiers with 0-1 loss , 2022, ArXiv.

[2]  Santiago Mazuelas,et al.  Generalized Maximum Entropy for Supervised Classification , 2020, IEEE Transactions on Information Theory.

[3]  Santiago Mazuelas,et al.  Minimax Classification with 0-1 Loss and Performance Guarantees , 2020, NeurIPS.

[4]  Ashok Cutkosky,et al.  Parameter-free, Dynamic, and Strongly-Adaptive Online Learning , 2020, ICML.

[5]  Zhisong Pan,et al.  The Strength of Nesterov’s Extrapolation in the Individual Convergence of Nonsmooth Optimization , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[6]  Georgios B. Giannakis,et al.  Random Feature-based Online Multi-kernel Learning in Environments with Unknown Dynamics , 2017, J. Mach. Learn. Res..

[7]  Daniel Kuhn,et al.  Regularization via Mass Transportation , 2017, J. Mach. Learn. Res..

[8]  Lijun Zhang,et al.  Adaptive Online Learning in Dynamic Environments , 2018, NeurIPS.

[9]  Alan Wee-Chung Liew,et al.  Variational inference based bayes online classifiers with concept drift adaptation , 2018, Pattern Recognit..

[10]  Geoffrey I. Webb,et al.  Analyzing concept drift and shift from sample data , 2018, Data Mining and Knowledge Discovery.

[11]  Tomoharu Iwata,et al.  Learning Non-Linear Dynamics of Decision Boundaries for Maintaining Classification Performance , 2017, AAAI.

[12]  Zhenyu Huang,et al.  Adaptive adjustment of noise covariance in Kalman filter for dynamic state estimation , 2017, 2017 IEEE Power & Energy Society General Meeting.

[13]  John C. Duchi,et al.  Variance-based Regularization with Convex Objectives , 2016, NIPS.

[14]  Trung Le,et al.  Large-scale Online Kernel Learning with Random Feature Reparameterization , 2017, IJCAI.

[15]  Tomoharu Iwata,et al.  Learning Future Classifiers without Additional Data , 2016, AAAI.

[16]  Steven C. H. Hoi,et al.  Large Scale Online Kernel Learning , 2016, J. Mach. Learn. Res..

[17]  Gregory Ditzler,et al.  Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.

[18]  Yu. Nesterov,et al.  Quasi-monotone Subgradient Methods for Nonsmooth Convex Minimization , 2015, J. Optim. Theory Appl..

[19]  A. Bifet,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[20]  Jeffrey Humpherys,et al.  A Fresh Look at the Kalman Filter , 2012, SIAM Rev..

[21]  Ameet Talwalkar,et al.  Foundations of Machine Learning , 2012, Adaptive computation and machine learning.

[22]  Mehryar Mohri,et al.  New Analysis and Algorithm for Learning with Drifting Distributions , 2012, ALT.

[23]  Niall M. Adams,et al.  lambda-Perceptron: An adaptive classifier for data streams , 2011, Pattern Recognit..

[24]  Thomas Seidl,et al.  MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering , 2010, WAPA.

[25]  Barbara Caputo,et al.  The projectron: a bounded kernel-based Perceptron , 2008, ICML '08.

[26]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[27]  Claudio Gentile,et al.  Tracking the best hyperplane with a simple budget Perceptron , 2006, Machine Learning.

[28]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[29]  James B. Rawlings,et al.  A new autocovariance least-squares method for estimating noise covariances , 2006, Autom..

[30]  Yoram Singer,et al.  The Forgetron: A Kernel-Based Perceptron on a Fixed Budget , 2005, NIPS.

[31]  Padraig Cunningham,et al.  A case-based technique for tracking concept drift in spam filtering , 2004, Knowl. Based Syst..

[32]  KlinkenbergRalf Learning drifting concepts: Example selection vs. example weighting , 2004 .

[33]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[34]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[35]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[36]  Thia Kirubarajan,et al.  Estimation with Applications to Tracking and Navigation: Theory, Algorithms and Software , 2001 .

[37]  Philip M. Long The Complexity of Learning According to Two Models of a Drifting Environment , 1998, COLT' 98.