A System for Induction of Oblique Decision Trees

This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.

[1]  Richard H. Roth An Approach to Solving Linear Discrete Optimization Problems , 1970, JACM.

[2]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[3]  D. Rubinfeld,et al.  Hedonic housing prices and the demand for clean air , 1978 .

[4]  W. W. Muir,et al.  Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1980 .

[5]  Bernard M. E. Moret,et al.  Decision Trees and Diagrams , 1982, CSUR.

[6]  R. Snee Regression Diagnostics: Identifying Influential Data and Sources of Collinearity , 1983 .

[7]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[8]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[9]  Michel Manago,et al.  Generalization and Noise , 1987, Int. J. Man Mach. Stud..

[10]  Tim Niblett,et al.  Constructing Decision Trees in Noisy Domains , 1987, EWSL.

[11]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[12]  Richard S. Johannes,et al.  Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes Mellitus , 1988 .

[13]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[14]  O. Mangasarian,et al.  Pattern Recognition Via Linear Programming: Theory and Application to Medical Diagnosis , 1989 .

[15]  David W. Aha,et al.  A study of instance-based algorithms for supervised learning tasks: mathematical, empirical, and psychological evaluations , 1990 .

[16]  Carla E. Brodley,et al.  An Incremental Method for Finding Multivariate Splits for Decision Trees , 1990, ML.

[17]  G. Pagallo ADAPTATIVE DECISION TREE ALGORITHMS FOR LEARNING FROM EXAMPLES (Ph.D. Thesis) , 1990 .

[18]  Marcus R. Frean,et al.  Small nets and short paths : optimising neural computation , 1990 .

[19]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[20]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[21]  Richard P. Brent,et al.  Fast training algorithms for multilayer neural nets , 1991, IEEE Trans. Neural Networks.

[22]  Carla E. Brodley,et al.  Linear Machine Decision Trees , 1991 .

[23]  Usama M. Fayyad,et al.  The Attribute Selection Problem in Decision Tree Generation , 1992, AAAI.

[24]  P. Utgoff,et al.  Multivariate Versus Univariate Decision Trees , 1992 .

[25]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[26]  O. Mangasarian,et al.  Robust linear programming discrimination of two linearly inseparable sets , 1992 .

[27]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[28]  Thierry Van de Merckt NFDT: A System that Learns Flexible Concepts Based on Decision Trees for Numerical Attributes , 1992, ML.

[29]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[30]  S. Odewahn,et al.  Automated star/galaxy discrimination with neural networks , 1992 .

[31]  Babak Hassibi,et al.  Second Order Derivatives for Network Pruning: Optimal Brain Surgeon , 1992, NIPS.

[32]  Gabor T. Herman,et al.  On Piecewise-Linear Classification , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Krzysztof J. Cios,et al.  A machine learning method for generation of a neural network architecture: a continuous ID3 algorithm , 1992, IEEE Trans. Neural Networks.

[34]  David George Heath,et al.  A geometric framework for machine learning , 1993 .

[35]  Claire Cardie,et al.  Using Decision Trees to Improve Case-Based Learning , 1993, ICML.

[36]  Thierry Van de Merckt Decision Trees in Numerical Attribute Spaces , 1993, IJCAI.

[37]  Simon Kasif,et al.  Induction of Oblique Decision Trees , 1993, IJCAI.

[38]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[39]  Jeffrey C. Schlimmer,et al.  Efficiently Inducing Determinations: A Complete and Systematic Search Algorithm that Uses Optimal Pruning , 1993, ICML.

[40]  Wray Buntine Tree Classification Software , 1993 .

[41]  Mehran Sahami Learning Non-Linearly Separable Boolean Functions With Linear Threshold Unit Trees and Madaline-Style Networks , 1993, AAAI.

[42]  William W. Cohen Efficient Pruning Methods for Separate-and-Conquer Rule Learning Systems , 1993, IJCAI.

[43]  D. Wolpert On Overfitting Avoidance as Bias , 1993 .

[44]  Simon Kasif,et al.  OC1: A Randomized Induction of Oblique Decision Trees , 1993, AAAI.

[45]  Kristin P. Bennett,et al.  Serial and Parallel Multicategory Discrimination , 1994, SIAM J. Optim..

[46]  O. Mangasarian,et al.  Multicategory discrimination via linear programming , 1994 .

[47]  Rajiv Gupta,et al.  On randomization in sequential and distributed algorithms , 1994, CSUR.

[48]  Richard L. White,et al.  DECISION TREES FOR AUTOMATED IDENTIFICATION OF COSMIC-RAY HITS IN HUBBLE SPACE TELESCOPE IMAGES , 1995 .