Comparing FBTSeg and NNTree Implementations with Established Ensemble Methods

This paper compares implementations of FBTSeg, a recent experimental segmentation method, and of NNTree, originally a neural network tree based segmentation method, with traditional methods for the combination of classifiers, namely bagging, boosting, and the traditional segmentation using information gain to split. The tests were carried out using three data mining techniques with distinct characteristics, specifically, linear regression, logistic regression, and multilayer perceptron neural networks, in four artificially built datasets. The datasets design was aimed at understanding the specific circumstances where each method, simple or combined, would present better performance. Results from this experiment suggest that blending classifiers using segmentation is a viable solution to improve the performance of both statistic regressions, that both FBTSeg and NNTree are in general more predictive than the traditional segmentation, while bagging and boosting are more effective alternatives for improving neural networks models.

[1]  Germano C. Vasconcelos,et al.  Neural Networks vs Logistic Regression: a Comparative Study on a Large Data Set , 2004, ICPR 2004.

[2]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[3]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[4]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[5]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[6]  Koen W. De Bock,et al.  An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction , 2011, Expert Syst. Appl..

[7]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[8]  KohaviRon,et al.  An Empirical Comparison of Voting Classification Algorithms , 1999 .

[9]  Jonathan N. Crook,et al.  Credit Scoring and Its Applications , 2002, SIAM monographs on mathematical modeling and computation.

[10]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[11]  Wei Fan,et al.  Bagging , 2009, Encyclopedia of Machine Learning.

[12]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[13]  M. F. Fuller,et al.  Practical Nonparametric Statistics; Nonparametric Statistical Inference , 1973 .

[14]  Paulo Cortez,et al.  Modeling wine preferences by data mining from physicochemical properties , 2009, Decis. Support Syst..

[15]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[16]  Pradipta Maji,et al.  Efficient design of neural network tree using a new splitting criterion , 2008, Neurocomputing.

[17]  Marek Kurzynski,et al.  A Measure of Competence Based on Randomized Reference Classifier for Dynamic Ensemble Selection , 2010, 2010 20th International Conference on Pattern Recognition.

[18]  Germano C. Vasconcelos,et al.  Neural Networks vs Logistic Regression: a Comparative Study on a Large Data Set , 2004, ICPR.

[19]  Melody Y. Kiang,et al.  A comparative assessment of classification methods , 2003, Decis. Support Syst..

[20]  PoelDirk Van den,et al.  An empirical evaluation of rotation-based ensemble classifiers for customer churn prediction , 2011 .

[21]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[22]  Lior Rokach,et al.  Decomposition Methodology for Knowledge Discovery and Data Mining - Theory and Applications , 2005, Series in Machine Perception and Artificial Intelligence.