A cost sensitive decision tree algorithm based on weighted class distribution with batch deleting attribute mechanism

Minimal cost classification is an important issue in data mining and machine learning. Recently, many enhanced algorithms based on the C4.5 algorithm have been proposed to tackle this issue. One disadvantage in these methods is that they are inefficient for medium or large data sets. To overcome this problem, we present a cost-sensitive decision tree algorithm based on weighted class distribution with a batch deleting attribute mechanism (BDADT). In the BDADT algorithm, a heuristic function is designed for evaluating attributes in node selection. This contains a weighted information gain ratio, a test cost, and a user-specified non-positive parameter for adjusting the effect of the test cost. Meanwhile, a batch deleting attribute mechanism is incorporated into our algorithm. This mechanism deletes redundant attributes according to the values of the heuristic function in the process of assigning nodes to improve the efficiency of decision tree construction. Experiments are conducted on 20 UCI data sets with representative test cost normal distribution to evaluate the proposed BDADT algorithm. The experimental results show that the average total costs obtained by the proposed algorithm are smaller than the existing CS-C4.5 and CS-GainRatio algorithms. Furthermore, the proposed algorithm significantly increases the efficiency of cost-sensitive decision tree construction.

[1]  Giorgio Sulligoi,et al.  A novel fault diagnosis technique for photovoltaic systems based on artificial neural networks , 2016 .

[2]  Joel D. Richter,et al.  Phosphorylation of CPE binding factor by Eg2 regulates translation of c-mos mRNA , 2000, Nature.

[3]  Yumin Chen,et al.  Finding rough set reducts with fish swarm algorithm , 2015, Knowl. Based Syst..

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Ye Chow Kuang,et al.  Defect cluster recognition system for fabricated semiconductor wafers , 2013, Eng. Appl. Artif. Intell..

[6]  Guillermo Glez. de Rivera,et al.  A similarity metric designed to speed up, using hardware, the recommender systems k-nearest neighbors algorithm , 2013, Knowl. Based Syst..

[7]  Yiyu Yao,et al.  The superiority of three-way decisions in probabilistic rough set models , 2011, Inf. Sci..

[8]  U. Alon,et al.  Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Yuhua Qian,et al.  Concept learning via granular computing: A cognitive viewpoint , 2014, Information Sciences.

[10]  Zdzislaw Pawlak,et al.  Rough sets and intelligent data analysis , 2002, Inf. Sci..

[11]  Decui Liang,et al.  Systematic studies on three-way decisions with interval-valued decision-theoretic rough sets , 2014, Inf. Sci..

[12]  Qinghua Hu,et al.  On Robust Fuzzy Rough Set Models , 2012, IEEE Transactions on Fuzzy Systems.

[13]  Qinghua Hu,et al.  Neighborhood classifiers , 2008, Expert Syst. Appl..

[14]  Yen-Liang Chen,et al.  Building a cost-constrained decision tree with multiple condition attributes , 2009, Inf. Sci..

[15]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[16]  Xiao Zhang,et al.  On rule acquisition in decision formal contexts , 2013, Int. J. Mach. Learn. Cybern..

[17]  Hong Zhao,et al.  Cost-Sensitive Feature Selection of Numeric Data with Measurement Errors , 2013, J. Appl. Math..

[18]  Hong Zhao,et al.  A cost sensitive decision tree algorithm with two adaptive mechanisms , 2015, Knowl. Based Syst..

[19]  P. Zimmet,et al.  Definition, diagnosis and classification of diabetes mellitus and its complications. Part 1: diagnosis and classification of diabetes mellitus. Provisional report of a WHO Consultation , 1998, Diabetic medicine : a journal of the British Diabetic Association.

[20]  Pavel Brazdil,et al.  Cost-Sensitive Decision Trees Applied to Medical Data , 2007, DaWaK.

[21]  William Zhu,et al.  A Competition Strategy to Cost-Sensitive Decision Trees , 2012, RSKT.

[22]  I. Saatci,et al.  A New Aneurysm Occlusion Classification after the Impact of Flow Modification , 2016, American Journal of Neuroradiology.

[23]  Guoyin Wang,et al.  An automatic method to determine the number of clusters using decision-theoretic rough set , 2014, Int. J. Approx. Reason..

[24]  Qinghua Hu,et al.  Neighborhood based sample and feature selection for SVM classification learning , 2011, Neurocomputing.

[25]  Goran Turk,et al.  The use of artificial neural networks for modeling air void content in aggregate mixture , 2016 .

[26]  Han Liu,et al.  Rule Based Systems for Big Data , 2015 .

[27]  Xiaohui Yu,et al.  Scalable Distributed Processing of K Nearest Neighbor Queries over Moving Objects , 2015, IEEE Transactions on Knowledge and Data Engineering.

[28]  Qinghua Hu,et al.  Cost-sensitive feature selection based on adaptive neighborhood granularity with multi-level confidence , 2016, Inf. Sci..

[29]  Witold Pedrycz,et al.  Hierarchical Granular Clustering: An Emergence of Information Granules of Higher Type and Higher Order , 2015, IEEE Transactions on Fuzzy Systems.

[30]  Yiyu Yao,et al.  A Partition Model of Granular Computing , 2004, Trans. Rough Sets.

[31]  Zhenmin Tang,et al.  Minimum cost attribute reduction in decision-theoretic rough set models , 2013, Inf. Sci..

[32]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[33]  Jason V. Davis,et al.  Cost-Sensitive Decision Tree Learning for Forensic Classification , 2006, ECML.

[34]  Decui Liang,et al.  Three-way group decisions with decision-theoretic rough sets , 2016, Inf. Sci..

[35]  Chengqi Zhang,et al.  Cost-sensitive classification with inadequate labeled data , 2012, Inf. Syst..

[36]  Fei-Yue Wang,et al.  Reduction and axiomization of covering generalized rough sets , 2003, Inf. Sci..

[37]  Hanadi S. Rifai,et al.  An exploratory decision tree analysis to predict cardiovascular disease risk in African American women. , 2016, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[38]  Fan Min,et al.  A hierarchical model for test-cost-sensitive decision systems , 2009, Inf. Sci..

[39]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[40]  Thierry Denoeux,et al.  Editing training data for multi-label classification with the k-nearest neighbor rule , 2016, Pattern Analysis and Applications.

[41]  Witold Pedrycz,et al.  Granular Computing: Analysis and Design of Intelligent Systems , 2013 .

[42]  Yiyu Yao,et al.  Three-way decisions with probabilistic rough sets , 2010, Inf. Sci..

[43]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[44]  Carl Kingsford,et al.  What are decision trees? , 2008, Nature Biotechnology.

[45]  Witold Pedrycz,et al.  A Granular Description of ECG Signals , 2006, IEEE Transactions on Biomedical Engineering.

[46]  Yoav Freund,et al.  Using Boosting for Financial Analysis and Performance Prediction: Application to S&P 500 Companies, Latin American ADRs and Banks , 2010 .

[47]  Steven W. Norton Generating Better Decision Trees , 1989, IJCAI.

[48]  Ming Tan,et al.  Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics , 1993, Machine Learning.

[49]  Hong Zhao,et al.  Optimal cost-sensitive granularization based on rough sets for variable costs , 2014, Knowl. Based Syst..

[50]  P. Sebastiani,et al.  Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer , 2007, Nature Medicine.