Learning Dynamics of Decision Boundaries without Additional Labeled Data

We propose a method for learning the dynamics of the decision boundary to maintain classification performance without additional labeled data. In various applications, such as spam-mail classification, the decision boundary dynamically changes over time. Accordingly, the performance of classifiers deteriorates quickly unless the classifiers are retrained using additional labeled data. However, continuously preparing such data is quite expensive or impossible. The proposed method alleviates this deterioration in performance by using newly obtained unlabeled data, which are easy to prepare, as well as labeled data collected beforehand. With the proposed method, the dynamics of the decision boundary is modeled by Gaussian processes. To exploit information on the decision boundaries from unlabeled data, the low-density separation criterion, i.e., the decision boundary should not cross high-density regions, but instead lie in low-density regions, is assumed with the proposed method. We incorporate this criterion into our framework in a principled manner by introducing the entropy posterior regularization to the posterior of the classifier parameters on the basis of the generic regularized Bayesian framework. We developed an efficient inference algorithm for the model based on variational Bayesian inference. The effectiveness of the proposed method was demonstrated through experiments using two synthetic and four real-world data sets.

[1]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[2]  Tomoharu Iwata,et al.  Learning Non-Linear Dynamics of Decision Boundaries for Maintaining Classification Performance , 2017, AAAI.

[3]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[4]  Tomoharu Iwata,et al.  Learning Future Classifiers without Additional Data , 2016, AAAI.

[5]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[6]  Jin Gao,et al.  Transfer Learning Based Visual Tracking with Gaussian Processes Regression , 2014, ECCV.

[7]  Steven C. H. Hoi,et al.  OTL: A Framework of Online Transfer Learning , 2010, ICML.

[8]  Joelle Pineau,et al.  Online Boosting Algorithms for Anytime Transfer and Multitask Learning , 2015, AAAI.

[9]  Takafumi Kanamori,et al.  Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection , 2008, NIPS.

[10]  Michael J. Pazzani,et al.  User Modeling for Adaptive News Access , 2000, User Modeling and User-Adapted Interaction.

[11]  João Gama,et al.  Data Stream Classification Guided by Clustering on Nonstationary Environments and Extreme Verification Latency , 2015, SDM.

[12]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[13]  Michael I. Jordan,et al.  Unsupervised Domain Adaptation with Residual Transfer Networks , 2016, NIPS.

[14]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[15]  Steven C. H. Hoi,et al.  Exact Soft Confidence-Weighted Learning , 2012, ICML.

[16]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[17]  Tomoharu Iwata,et al.  Learning Latest Classifiers without Additional Labeled Data , 2017, IJCAI.

[18]  Ivor W. Tsang,et al.  Domain Adaptation via Transfer Component Analysis , 2009, IEEE Transactions on Neural Networks.

[19]  Michael I. Jordan,et al.  Deep Transfer Learning with Joint Adaptation Networks , 2016, ICML.

[20]  Jennifer Widom,et al.  Models and issues in data stream systems , 2002, PODS.

[21]  Shin Ishii,et al.  Distributional Smoothing with Virtual Adversarial Training , 2015, ICLR 2016.

[22]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[23]  Ivan Koychev,et al.  Gradual Forgetting for Adaptation to Concept Drift , 2000 .

[24]  Robi Polikar,et al.  LEVELIW: Learning extreme verification latency with importance weighting , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[25]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[26]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[27]  Mahdieh Soleymani Baghshah,et al.  Incremental Evolving Domain Adaptation , 2016, IEEE Transactions on Knowledge and Data Engineering.

[28]  Yoshua Bengio,et al.  Semi-supervised Learning by Entropy Minimization , 2004, CAP.

[29]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[30]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[31]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[32]  Ning Chen,et al.  Bayesian inference with posterior regularization and applications to infinite latent SVMs , 2012, J. Mach. Learn. Res..

[33]  Bala Srinivasan,et al.  StreamAR: Incremental and Active Learning with Evolving Sensory Data for Activity Recognition , 2012, 2012 IEEE 24th International Conference on Tools with Artificial Intelligence.

[34]  Alfred Kobsa User Modeling and User-Adapted Interaction , 2005, User Modeling and User-Adapted Interaction.

[35]  Robi Polikar,et al.  COMPOSE: A Semisupervised Learning Framework for Initially Labeled Nonstationary Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[36]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[37]  Stephen J. Roberts,et al.  Adaptive Classification by Variational Kalman Filtering , 2002, NIPS.

[38]  Latifur Khan,et al.  SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream , 2016, AAAI.

[39]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[40]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[41]  Ming Li,et al.  Online Manifold Regularization: A New Learning Setting and Empirical Study , 2008, ECML/PKDD.

[42]  Stephen J. Roberts,et al.  Adaptive classification for Brain Computer Interface systems using Sequential Monte Carlo sampling , 2009, Neural Networks.

[43]  Trevor Darrell,et al.  Continuous Manifold Based Adaptation for Evolving Visual Domains , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Ralf Klinkenberg,et al.  Learning drifting concepts: Example selection vs. example weighting , 2004, Intell. Data Anal..

[45]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[46]  Xindong Wu,et al.  Mining Recurring Concept Drifts with Limited Labeled Streaming Data , 2010, TIST.

[47]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[48]  Geoff Holmes,et al.  Active Learning With Drifting Streaming Data , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[49]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[50]  Koby Crammer,et al.  Adaptive regularization of weight vectors , 2009, Machine Learning.

[51]  Li Guo,et al.  Mining Data Streams with Labeled and Unlabeled Training Examples , 2009, 2009 Ninth IEEE International Conference on Data Mining.