Missing Values in Dissimilarity-Based Classification of Multi-way Data

Missing values can occur frequently in many real world situations. Such is the case of multi-way data applications, where objects are usually represented by arrays of 2 or more dimensions e.g.i¾?biomedical signals that can be represented as time-frequency matrices. This lack of attributes tends to influence the analysis of the data. In classification tasks for example, the performance of classifiers is usually deteriorated. Therefore, it is necessary to address this problem before classifiers are built. Although the absence of values is common in these types of data sets, there are just a few studies to tackle this problem for classification purposes. In this paper, we study two approaches to overcome the missing values problem in dissimilarity-based classification of multi-way data. Namely, imputation by factorization, and a modification of the previously proposed Continuous Multi-way Shape measure for comparing multi-way objects.

[1]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[2]  R. Bro,et al.  PARAFAC and missing values , 2005 .

[3]  Rasmus Bro,et al.  Multi-way Analysis with Applications in the Chemical Sciences , 2004 .

[4]  D. Massart,et al.  Dealing with missing data: Part II , 2001 .

[5]  Tamara G. Kolda,et al.  Scalable Tensor Factorizations for Incomplete Data , 2010, ArXiv.

[6]  Robert P. W. Duin,et al.  Continuous Multi-way Shape Measure for Dissimilarity Representation , 2012, CIARP.

[7]  Rasmus Bro,et al.  New exploratory clustering tool , 2008 .

[8]  Robert P. W. Duin,et al.  Classification of three-way data by the dissimilarity representation , 2011, Signal Process..

[9]  L. Lathauwer,et al.  From Matrix to Tensor : Multilinear Algebra and Signal Processing , 1996 .

[10]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[11]  Giovanni Parolari,et al.  Monitoring chemical changes of dry-cured Parma ham during processing by surface autofluorescence spectroscopy. , 2003, Journal of agricultural and food chemistry.

[12]  Rafael C. González,et al.  Digital image processing, 3rd Edition , 2008 .

[13]  Rasmus Bro,et al.  Real-time monitoring and chemical profiling of a cultivation process , 2006 .

[14]  P. Kroonenberg Applied Multiway Data Analysis , 2008 .

[15]  David E. Booth,et al.  Multi-Way Analysis: Applications in the Chemical Sciences , 2005, Technometrics.

[16]  Robert P. W. Duin,et al.  Dissimilarity-based classification of data with missing attributes , 2010, 2010 2nd International Workshop on Cognitive Information Processing.

[17]  Luis Alvarez,et al.  Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications , 2012, Lecture Notes in Computer Science.

[18]  D. Massart,et al.  Dealing with missing data , 2001 .