A Novel EagerDT Complexity Approach to Deal with Missing Values in Decision Trees

Incomplete data in decision trees affect classification accuracy. ‘Lazy’ approaches avoid missing values at testing by considering only attributes whose values are known and hence provide the best accuracy. We propose EagerDT, a variant of Eager Decision Tree, to build a single classification model at training considering the possibility of unknown values at every node in the tree. This removes the problem of missing values like lazy strategy. The biggest advantage of EagerDT over the lazy approach is that it creates a single tree at training. We describe the complexity of EagerDT algorithm and compare it with regular decision trees. We propose various novel approaches to reduce the complexity.

[1]  Ron Kohavi,et al.  Lazy Decision Trees , 1996, AAAI/IAAI, Vol. 1.

[2]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[3]  Shichao Zhang,et al.  "Missing is useful": missing values in cost-sensitive decision trees , 2005, IEEE Transactions on Knowledge and Data Engineering.

[4]  Xindong Wu,et al.  Efficient missing data imputation for supervised learning , 2010, 9th IEEE International Conference on Cognitive Informatics (ICCI'10).

[5]  Sachin Gavankar,et al.  Decision Tree: Review of Techniques for Missing Values at Training, Testing and Compatibility , 2015, 2015 3rd International Conference on Artificial Intelligence, Modelling and Simulation (AIMS).

[6]  Foster J. Provost,et al.  Handling Missing Values when Applying Classification Models , 2007, J. Mach. Learn. Res..

[7]  Archana N. Gulati,et al.  A novel technique for multidocument Hindi text summarization , 2017, 2017 International Conference on Nascent Technologies in Engineering (ICNTE).

[8]  Sachin S. Gavankar,et al.  Eager decision tree , 2017, 2017 2nd International Conference for Convergence in Technology (I2CT).