Randomly selected decision tree for test-cost sensitive learning

Graphical abstractDisplay Omitted HighlightsTest-cost sensitive learning is often desirable in many real-world applications.We reviewed the related work on test-cost sensitive decision tree learning.We propose a new test-cost sensitive decision tree learning algorithm.We conduct a random search to find an appropriate attribute to test at each node.Experimental results on a large number of datasets validate its effectiveness. In many real-world applications, decision trees that take account of the cost of acquiring attributes for decision making have been the research focuses. The decision-making process must learn which sequence to perform, and how to build an inexpensive and reliable inductive learning model to accomplish its task. Many previous works in the area of test-cost sensitive decision tree learning have successfully reduced the total test cost, unfortunately also degraded the classification accuracy simultaneously. This paper works on a new idea, i.e., it does not has to reduce the total test cost at the cost of the loss of classification accuracy. For that, we propose a multi-target adaptive attribute selection measure and a simple but effective method for building and testing decision trees. Instead of using a greedy attribute selection measure like many other decision tree learning algorithms, our algorithm uses a random attribute selection measure to find an appropriate attribute to test at each node in the tree. Specifically, we conduct a random search through the whole space of attributes in tree building, and we call the resulting model randomly selected decision tree (RSDT). By this way, RSDT significantly reduces the total test cost, yet at the same time maintains the higher classification accuracy compared to its competitors. The experimental results on 36 UCI datasets validate the effectiveness of our proposed RSDT.

[1]  Jason V. Davis,et al.  Cost-Sensitive Decision Tree Learning for Forensic Classification , 2006, ECML.

[2]  David Riaño,et al.  Improving medical decision trees by combining relevant health-care criteria , 2012, Expert Syst. Appl..

[3]  Chengqi Zhang,et al.  Cost-Sensitive Decision Trees with Multiple Cost Scales , 2004, Australian Conference on Artificial Intelligence.

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  Marlon Núñez,et al.  Economic Induction: A Case Study , 1988, EWSL.

[6]  Ming Tan,et al.  Two Case Studies in Cost-Sensitive Concept Acquisition , 1990, AAAI.

[7]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[8]  Tao Wang,et al.  Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning , 2010, J. Syst. Softw..

[9]  Liangxiao Jiang,et al.  Learning decision tree for ranking , 2009, Knowledge and Information Systems.

[10]  Qiang Yang,et al.  Test-cost sensitive classification on data with missing values , 2006, IEEE Transactions on Knowledge and Data Engineering.

[11]  Pavel Brazdil,et al.  Cost-Sensitive Decision Trees Applied to Medical Data , 2007, DaWaK.

[12]  William Zhu,et al.  A Competition Strategy to Cost-Sensitive Decision Trees , 2012, RSKT.

[13]  Paul Compton,et al.  Inductive knowledge acquisition: a case study , 1987 .

[14]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[15]  Peter D. Turney Cost-Sensitive Classification: Empirical Evaluation of a Hybrid Genetic Decision Tree Induction Algorithm , 1994, J. Artif. Intell. Res..

[16]  Mark A. Hall,et al.  A decision tree-based attribute weighting filter for naive Bayes , 2006, Knowl. Based Syst..

[17]  Qiang Yang,et al.  Decision trees with minimal costs , 2004, ICML.

[18]  Liangxiao Jiang,et al.  Improving Tree augmented Naive Bayes for class probability estimation , 2012, Knowl. Based Syst..

[19]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[20]  Jing-Yu Yang,et al.  Test cost sensitive multigranulation rough set: Model and minimal cost selection , 2013, Inf. Sci..

[21]  Qinghua Hu,et al.  Feature selection with test cost constraint , 2012, ArXiv.

[22]  Steven W. Norton Generating Better Decision Trees , 1989, IJCAI.

[23]  Yuhua Qian,et al.  Test-cost-sensitive attribute reduction , 2011, Inf. Sci..

[24]  Amir Ahmad,et al.  Decision tree ensembles based on kernel features , 2014, Applied Intelligence.

[25]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.

[26]  Harry Zhang,et al.  A Fast Decision Tree Learning Algorithm , 2006, AAAI.

[27]  Liangxiao Jiang,et al.  Learning random forests for ranking , 2011, Frontiers of Computer Science in China.

[28]  Liangxiao Jiang,et al.  Not so greedy: Randomly Selected Naive Bayes , 2012, Expert Syst. Appl..

[29]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[30]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[31]  Liangxiao Jiang,et al.  A Novel Bayes Model: Hidden Naive Bayes , 2009, IEEE Transactions on Knowledge and Data Engineering.

[32]  Liangxiao Jiang,et al.  Random one-dependence estimators , 2011, Pattern Recognit. Lett..

[33]  Beatriz Remeseiro,et al.  mC-ReliefF - An Extension of ReliefF for Cost-based Feature Selection , 2014, ICAART.

[34]  Ian Witten,et al.  Data Mining , 2000 .

[35]  Lior Rokach,et al.  The CASH algorithm-cost-sensitive attribute selection using histograms , 2013, Inf. Sci..

[36]  Qiang Yang,et al.  Simple Test Strategies for Cost-Sensitive Decision Trees , 2005, ECML.

[37]  Verónica Bolón-Canedo,et al.  A framework for cost-based feature selection , 2014, Pattern Recognit..

[38]  Jun Yang,et al.  Cost-Sensitive Feature Selection on Heterogeneous Data , 2015, PAKDD.

[39]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[40]  Liangxiao Jiang,et al.  Beyond accuracy: Learning selective Bayesian classifiers with minimal test cost , 2016, Pattern Recognit. Lett..

[41]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.