Online transfer learning with multiple decision trees

Online learning techniques have been widely used in many fields where instances come one by one. However, in early stage of a data stream, online learning models cannot exhibit good classification accuracy for it cannot collect sufficient instances to learn. For example, a well-known online learning algorithm named as very fast decision tree (VFDT) needs to wait for Hoeffding bound satisfied to split, which leads to poor classification accuracy at the beginning of data stream. Thus, VFDT may not be appropriate for some real applications which demand us a fast and accurate online detection. This situation will become more serious in the scenario of data stream classification with concept drift. This paper attempts to take transfer learning algorithm to make up this shortcoming of VFDT. To achieve this goal, a new decision tree method named as VFDT-D is first proposed to cache instances in its leaf nodes to handle numerical attributes and adapt to a framework of online transfer learning (OTL), and then a measure which considers tree path, classification accuracy and classification confidence is proposed to evaluate the local similarity between source and target domain classifiers. At last, a multiple-source online transfer learning algorithm named as DMOTL is proposed to take VFDT-D as base classifier and use the proposed measure of local similarity to select the optimal source domain classifier to help transfer learning. The extensive experiments on several synthetic and real-world datasets demonstrate the advantage of the proposed algorithm.

[1]  Xuesong Wang,et al.  Self-adaptive Transfer for Decision Trees Based on Similarity Metric: Self-adaptive Transfer for Decision Trees Based on Similarity Metric , 2014 .

[2]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[3]  Steven C. H. Hoi,et al.  OTL: A Framework of Online Transfer Learning , 2010, ICML.

[4]  M. P. S. Bhatia,et al.  A novel online ensemble approach to handle concept drifting data streams: diversified dynamic weighted majority , 2018, Int. J. Mach. Learn. Cybern..

[5]  Bin Li,et al.  Online Transfer Learning , 2014, Artif. Intell..

[6]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[7]  Zhexue Huang,et al.  CLUSTERING LARGE DATA SETS WITH MIXED NUMERIC AND CATEGORICAL VALUES , 1997 .

[8]  Jian Su,et al.  Source-Selection-Free Transfer Learning , 2011, IJCAI.

[9]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[10]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[11]  Qiang Yang,et al.  Transfer learning in heterogeneous collaborative filtering domains , 2013, Artif. Intell..

[12]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[13]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[14]  Yannis Theodoridis,et al.  A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees , 2008, SDM.

[15]  Wang Feng,et al.  Online Learning Algorithms for Big Data Analytics: A Survey , 2015 .

[16]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[17]  Taghi M. Khoshgoftaar,et al.  A survey of transfer learning , 2016, Journal of Big Data.

[18]  Congfu Xu,et al.  Adaptive Bayesian personalized ranking for heterogeneous implicit feedbacks , 2015, Knowl. Based Syst..

[19]  Eric Eaton,et al.  Selective Transfer Between Learning Tasks Using Task-Based Boosting , 2011, AAAI.

[20]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[21]  Wang Xue,et al.  Self-adaptive Transfer for Decision Trees Based on Similarity Metric: Self-adaptive Transfer for Decision Trees Based on Similarity Metric , 2014 .

[22]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[23]  Qingyao Wu,et al.  Online Transfer Learning with Multiple Homogeneous or Heterogeneous Sources , 2017, IEEE Transactions on Knowledge and Data Engineering.

[24]  Liang Ge,et al.  OMS-TL: a framework of online multiple source transfer learning , 2013, CIKM.

[25]  João Gama,et al.  Accurate decision trees for mining high-speed data streams , 2003, KDD '03.

[26]  Sethuraman Panchanathan,et al.  Multi-source domain adaptation and its application to early detection of fatigue , 2011, KDD.

[27]  Trevor Darrell,et al.  Transfer learning for image classification with sparse prototype representations , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.