Missing or Inapplicable: Treatment of Incomplete Continuous-valued Features in Supervised Learning

Real-world data are often riddled with data quality problems such as noise, outliers and missing values, which present significant challenges for supervised learning algorithms to effectively classify them. This paper explores the ill-effects of inapplicable features on the performance of supervised learning algorithms. In particular, we highlight the difference between missing and inapplicable feature values. We argue that the current approaches for dealing with missing values, which are mostly based on single or multiple imputation methods, are insufficient to handle inapplicable features, especially those that are continuous valued. We also illustrate how current tree-based and kernelbased classifiers can be adversely affected by the presence of such features if not handled appropriately. Finally, we propose methods to extend existing tree-based and kernel-based classifiers to deal with the inapplicable continuous-valued features.

[1]  G. V. Kass An Exploratory Technique for Investigating Large Quantities of Categorical Data , 1980 .

[2]  Daniel T. Larose,et al.  Discovering Knowledge in Data: An Introduction to Data Mining , 2005 .

[3]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[4]  E. F. Codd,et al.  Extending the database relational model to capture more meaning , 1979, ACM Trans. Database Syst..

[5]  Frank D. Linn Missing and inapplicable values , 1987, SGMD.

[6]  René Zaragüeta-Bagils,et al.  Three-item analysis: Hierarchical representation and treatment of missing and inapplicable data , 2007 .

[7]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[8]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[9]  Yoram Singer,et al.  Logistic Regression, AdaBoost and Bregman Distances , 2000, Machine Learning.

[10]  Jeffrey S. Simonoff,et al.  An Investigation of Missing Data Methods for Classification Trees , 2006, J. Mach. Learn. Res..

[11]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[12]  Katya Scheinberg,et al.  Efficient SVM Training Using Low-Rank Kernel Representations , 2002, J. Mach. Learn. Res..

[13]  Johan A. K. Suykens,et al.  Handling missing values in support vector machine classifiers , 2005, Neural Networks.

[14]  Stephen Henley,et al.  The man who wasn't there: The problem of partially missing data , 2005, Comput. Geosci..

[15]  Craig K. Enders,et al.  Missing Data in Educational Research: A Review of Reporting Practices and Suggestions for Improvement , 2004 .

[16]  E. F. Codd,et al.  Missing information (applicable and inapplicable) in relational databases , 1986, SGMD.

[17]  Inderjit S. Dhillon,et al.  Metric and Kernel Learning Using a Linear Transformation , 2009, J. Mach. Learn. Res..

[18]  Koby Crammer,et al.  Kernel Design Using Boosting , 2002, NIPS.

[19]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[20]  Ian Witten,et al.  Data Mining , 2000 .