Construction of Single Classifier from Multiple Interim Classification Trees

Summary Predicting the future is always a quest of mankind and thus Supervised Learning (predictive modeling, machine learning) is one of the most rapidly used techniques of Data Mining. Finding out patterns and measuring the accuracies of the foretell are very hot research areas these days. In this research paper, we have introduced a new method to generate a final optimal and accurate classifier from several interim classification trees for various samples of same dataset. This method saves plenty of time of passing test data from several trees because instead we will have an ultimate classifier from the merger of the interim trees, with the help of information gain theory. In this paper, the method is applied on the Drug Data used in SPSS Clementine demonstration. Above all, the proposed method is quite simple and easy to understand as well as easy to implement in practical environment.

[1]  Sally A. McKee,et al.  Efficient architectural design space exploration via predictive modeling , 2008, TACO.

[2]  Ashok N. Srivastava,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2005, J. Comput. Inf. Sci. Eng..

[3]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[4]  AT. WHARTON,et al.  Response to Mease and Wyner , Evidence Contrary to the Statistical View of Boosting , JMLR 9 : 131 – 156 , 2008 , 2022 .

[5]  Robert E. Schapire,et al.  Theoretical Views of Boosting , 1999, EuroCOLT.

[6]  Peter Bednár,et al.  A comparison of the bagging and the boosting methods using the decision trees classifiers , 2006, Comput. Sci. Inf. Syst..

[7]  Christian A. Lang,et al.  Using predictive analysis to improve invoice-to-cash collection , 2008, KDD.

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Rohan A. Baxter,et al.  Predictive Model of Insolvency Risk for Australian Corporations , 2007, AusDM.

[10]  Sandip S. Patil,et al.  Tracking and identification of suspicious and abnormal behaviors using supervised machine learning technique , 2009, ICAC3 '09.

[11]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[12]  Elisa Bertino,et al.  State-of-the-art in privacy preserving data mining , 2004, SGMD.

[13]  Yong Wang,et al.  Predicting link quality using supervised learning in wireless sensor networks , 2007, MOCO.

[14]  Gunnar Rätsch,et al.  An Introduction to Boosting and Leveraging , 2002, Machine Learning Summer School.

[15]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[16]  Yoav Freund,et al.  Response to Mease and Wyner, Evidence Contrary to the Statistical View of Boosting, JMLR 9:131-156, 2008 , 2008 .

[17]  David Mease,et al.  Evidence Contrary to the Statistical View of Boosting , 2008, J. Mach. Learn. Res..

[18]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[19]  Matthias W. Seeger,et al.  Cross-Validation Optimization for Large Scale Structured Classification Kernel Methods , 2008, J. Mach. Learn. Res..

[20]  Nicolás García-Pedrajas,et al.  Nonlinear Boosting Projections for Ensemble Construction , 2007, J. Mach. Learn. Res..

[21]  Paul Gray,et al.  Introduction to Data Mining and Knowledge Discovery , 1998, Proceedings of the Thirty-First Hawaii International Conference on System Sciences.

[22]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[23]  Kristina Machova,et al.  A Bagging Method using Decision Trees in the Role of Base Classifiers , 2006 .

[24]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[25]  Paul S. Bradley,et al.  Mathematical Programming for Data Mining: Formulations and Challenges , 1999, INFORMS J. Comput..

[26]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[27]  Leen-Kiat Soh,et al.  Computing information gain for spatial data support , 2008, GIS '08.

[28]  Eric Brill,et al.  Bagging and Boosting a Treebank Parser , 2000, ANLP.

[29]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[30]  H. Hirose,et al.  A Robust Bagging Method Using Median as a Combination Rule , 2008, 2008 IEEE 8th International Conference on Computer and Information Technology Workshops.

[31]  Norman D. Black,et al.  Feature selection and classification model construction on type 2 diabetic patients' data , 2007, Artif. Intell. Medicine.