Dissimilarity-based classification of data with missing attributes

In many real world data applications, objects may have missing attributes. Conventional techniques used to classify this kind of data are represented in a feature space. However, usually they need imputation methods and/or changing the classifiers. In this paper, we propose two classification alternatives based on dissimilarities. These techniques promise to be appealing for solving the problem of classification of data with missing attributes. Results obtained with the two approaches outperform the results of the techniques based in the feature space. Besides, the proposed approaches have the advantage that they hardly require additional computations like imputations or classifier updating.

[1]  Robert P. W. Duin,et al.  Dissimilarity-based classification for vectorial representations , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[2]  R. Duin,et al.  The dissimilarity representation for pattern recognition , a tutorial , 2009 .

[3]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[4]  Ao Li,et al.  Missing value estimation for DNA microarray gene expression data by Support Vector Regression imputation and orthogonal coding scheme , 2006, BMC Bioinformatics.

[5]  Bingru Yang,et al.  A SVM Regression Based Approach to Filling in Missing Values , 2005, KES.

[6]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[7]  Lukasz A. Kurgan,et al.  Impact of imputation of missing values on classification error for discrete data , 2008, Pattern Recognit..

[8]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[9]  Lorenzo Bruzzone,et al.  An extension of the Jeffreys-Matusita distance to multiclass cases for feature selection , 1995, IEEE Trans. Geosci. Remote. Sens..

[10]  Shichao Zhang,et al.  Parimputation: From Imputation and Null-Imputation to Partially Imputation , 2008, IEEE Intell. Informatics Bull..

[11]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[12]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[13]  Yang Zhang,et al.  Data Imputation Using Least Squares Support Vector Machines in Urban Arterial Streets , 2009, IEEE Signal Processing Letters.

[14]  Robert P. W. Duin,et al.  Dissimilarity representations allow for building good classifiers , 2002, Pattern Recognit. Lett..

[15]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[16]  Thanh Ha Dang,et al.  Using Entropy to Impute Missing Data in a Classification Task , 2007, 2007 IEEE International Fuzzy Systems Conference.