Why are neural networks sometimes much more accurate than decision trees: an analysis on a bio-informatics problem

Bio-informatics data sets may be large in the number of examples and/or the number of features. Predicting the secondary structure of proteins from amino acid sequences is one example of high dimensional data for which large training sets exist. The data from the KDD Cup 2001 on the binding of compounds to thrombin is another example of a very high dimensional data set. This type of data set can require significant computing resources to train a neural network. In general, decision trees will require much less training time than neural networks. There have been a number of studies on the advantages of decision trees relative to neural networks for specific data sets. There are often statistically significant, though typically not very large, differences. Here, we examine one case in which a neural network greatly outperforms a decision tree; predicting the secondary structure of proteins. The hypothesis that the neural network learns important features of the data through its hidden units is explored by a using a neural network to transform data for decision tree training. Experiments show that this explains some of the performance difference, but not all. Ensembles of decision trees are compared with a single neural network. It is our conclusion that the problem of protein secondary structure prediction exhibits some characteristics that are fundamentally better exploited by a neural network model.

[1]  J L Castro,et al.  Neural networks with a continuous squashing function in the output are universal approximators , 2000, Neural Networks.

[2]  Thomas G. Dietterich,et al.  A Comparison of ID3 and Backpropagation for English Text-To-Speech Mapping , 2004, Machine Learning.

[3]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[4]  Raymond J. Mooney,et al.  An Experimental Comparison of Symbolic and Connectionist Learning Algorithms , 1989, IJCAI.

[5]  Petra Perner,et al.  A comparison between neural networks and decision trees based on data from industrial radiographic testing , 2001, Pattern Recognit. Lett..

[6]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[7]  Nitesh V. Chawla,et al.  Distributed learning with bagging-like performance , 2003, Pattern Recognit. Lett..

[8]  Nitesh V. Chawla,et al.  Bagging-like effects for decision trees and neural nets in protein secondary structure prediction , 2001 .

[9]  J. Ross Quinlan,et al.  Improved Use of Continuous Attributes in C4.5 , 1996, J. Artif. Intell. Res..

[10]  Raymond J. Mooney,et al.  Symbolic and neural learning algorithms: An experimental comparison , 1991, Machine Learning.

[11]  D T Jones,et al.  Protein secondary structure prediction based on position-specific scoring matrices. , 1999, Journal of molecular biology.

[12]  Lawrence O. Hall,et al.  Is Error-Based Pruning Redeemable? , 2003, Int. J. Artif. Intell. Tools.

[13]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[14]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[15]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[16]  Ishwar K. Sethi,et al.  Comparison between entropy net and decision tree classifiers , 1990, 1990 IJCNN International Joint Conference on Neural Networks.

[17]  Peter L. Bartlett,et al.  Neural Network Learning - Theoretical Foundations , 1999 .

[18]  Donald E. Brown,et al.  A comparison of decision tree classifiers with backpropagation neural networks for multimodal classification problems , 1992, Pattern Recognit..

[19]  Raymond J. Mooney,et al.  Symbolic and Neural Learning Algorithms: An Experimental Comparison , 1991, Machine Learning.