Online Adaptive Decision Trees: Pattern Classification and Function Approximation

Recently we have shown that decision trees can be trained in the online adaptive (OADT) mode (Basak, 2004), leading to better generalization score. OADTs were bottlenecked by the fact that they are able to handle only two-class classification tasks with a given structure. In this article, we provide an architecture based on OADT, ExOADT, which can handle multiclass classification tasks and is able to perform function approximation. ExOADT is structurally similar to OADT extended with a regression layer. We also show that ExOADT is capable not only of adapting the local decision hyperplanes in the nonterminal nodes but also has the potential of smoothly changing the structure of the tree depending on the data samples. We provide the learning rules based on steepest gradient descent for the new model ExOADT. Experimentally we demonstrate the effectiveness of ExOADT in the pattern classification and function approximation tasks. Finally, we briefly discuss the relationship of ExOADT with other classification models.

[1]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[2]  Carla E. Brodley,et al.  Multivariate decision trees , 2004, Machine Learning.

[3]  Manuela M. Veloso,et al.  Tree Based Discretization for Continuous State Space Reinforcement Learning , 1998, AAAI/IAAI.

[4]  Jen-Tzung Chien,et al.  Compact decision trees with cluster validity for speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Kristin P. Bennett,et al.  On support vector decision trees for database marketing , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[6]  M. J. D. Powell,et al.  Radial basis functions for multivariable interpolation: a review , 1987 .

[7]  Marti A. Hearst Trends & Controversies: Support Vector Machines , 1998, IEEE Intell. Syst..

[8]  Jan-Erik Strömberg,et al.  Neural trees-using neural nets in a tree classifier structure , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[9]  Bernhard Schölkopf,et al.  A tutorial on support vector regression , 2004, Stat. Comput..

[10]  Michael Riley,et al.  Some Applications of Tree-based Modelling to Speech and Language , 1989, HLT.

[11]  M. Golea,et al.  A Growth Algorithm for Neural Network Decision Trees , 1990 .

[12]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[13]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[14]  J. Platt Sequential Minimal Optimization : A Fast Algorithm for Training Support Vector Machines , 1998 .

[15]  J. Nazuno Haykin, Simon. Neural networks: A comprehensive foundation, Prentice Hall, Inc. Segunda Edición, 1999 , 2000 .

[16]  M. Kanehisa,et al.  Expert system for predicting protein localization sites in gram‐negative bacteria , 1991, Proteins.

[17]  Larry D. Pyeatt,et al.  Decision Tree Function Approximation in Reinforcement Learning , 1999 .

[18]  Paul Horton,et al.  A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins , 1996, ISMB.

[19]  Alberto Suárez,et al.  Globally Optimal Fuzzy Decision Trees for Classification and Regression , 1999, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[21]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[22]  Yoon Ho Cho,et al.  A personalized recommender system based on web usage mining and decision tree induction , 2002, Expert Syst. Appl..

[23]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[24]  M. Buhmann Multivariate cardinal interpolation with radial-basis functions , 1990 .

[25]  D. Broomhead,et al.  Radial Basis Functions, Multi-Variable Functional Interpolation and Adaptive Networks , 1988 .

[26]  Trevor Hastie,et al.  Additive Logistic Regression : a Statistical , 1998 .

[27]  Paul E. Utgoff,et al.  Decision Tree Induction Based on Efficient Tree Restructuring , 1997, Machine Learning.

[28]  Oren Etzioni,et al.  Web document clustering: a feasibility demonstration , 1998, SIGIR '98.

[29]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[30]  David G. Stork,et al.  Pattern Classification (2nd ed.) , 1999 .

[31]  John Moody,et al.  Fast Learning in Networks of Locally-Tuned Processing Units , 1989, Neural Computation.

[32]  Shun-ichi Amari,et al.  Natural Gradient Works Efficiently in Learning , 1998, Neural Computation.

[33]  Cezary Z. Janikow,et al.  Fuzzy decision trees: issues and methods , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[34]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[35]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[36]  Thomas G. Dietterich,et al.  Efficient Value Function Approximation Using Regression Trees , 1999 .

[37]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[38]  Steven Salzberg,et al.  A Decision Tree System for Finding Genes in DNA , 1998, J. Comput. Biol..

[39]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[40]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[41]  Shun-ichi Amari,et al.  A Theory of Adaptive Pattern Classifiers , 1967, IEEE Trans. Electron. Comput..

[42]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[43]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[44]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[45]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[46]  David S. Broomhead,et al.  Multivariable Functional Interpolation and Adaptive Networks , 1988, Complex Syst..

[47]  Michael I. Jordan,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1994, Neural Computation.

[48]  Olcay Boz,et al.  Converting A Trained Neural Network To a Decision Tree DecText - Decision Tree Extractor , 2002, ICMLA.

[49]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[50]  Bernhard Schölkopf,et al.  On a Kernel-Based Method for Pattern Recognition, Regression, Approximation, and Operator Inversion , 1998, Algorithmica.

[51]  Simon Kasif,et al.  A System for Induction of Oblique Decision Trees , 1994, J. Artif. Intell. Res..

[52]  Stephen R. Garner,et al.  WEKA: The Waikato Environment for Knowledge Analysis , 1996 .

[53]  Roel Wieringa,et al.  An integrated framework for ought-to-be and ought-to-do constraints , 2004, Artificial Intelligence and Law.

[54]  T Poggio,et al.  Regularization Algorithms for Learning That Are Equivalent to Multilayer Networks , 1990, Science.

[55]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[56]  Karl Branting,et al.  A computational model of ratio decidendi , 2004, Artificial Intelligence and Law.

[57]  Donald Geman,et al.  Model-based classification trees , 2001, IEEE Trans. Inf. Theory.

[58]  W E Grimson,et al.  A computational theory of visual surface interpolation. , 1982, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[59]  Tomaso Poggio,et al.  Computational vision and regularization theory , 1985, Nature.

[60]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[61]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[62]  S. Albers Competitive Online Algorithms , 1996 .

[63]  Alexander J. Smola,et al.  Support Vector Method for Function Approximation, Regression Estimation and Signal Processing , 1996, NIPS.

[64]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[65]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[66]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[67]  S. Gunn Support Vector Machines for Classification and Regression , 1998 .

[68]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[69]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[70]  A. N. Tikhonov,et al.  Solutions of ill-posed problems , 1977 .

[71]  Jayanta Basak,et al.  Online Adaptive Decision Trees , 2004, Neural Computation.