Discriminant Analysis of Interval Data: An Assessment of Parametric and Distance-Based Approaches

Building on probabilistic models for interval-valued variables, parametric classification rules, based on Normal or Skew-Normal distributions, are derived for interval data. The performance of such rules is then compared with distancebased methods previously investigated. The results show that Gaussian parametric approaches outperform Skew-Normal parametric and distance-based ones in most conditions analyzed. In particular, with heterocedastic data a quadratic Gaussian rule always performs best. Moreover, restricted cases of the variance-covariance matrix lead to parsimonious rules which for small training samples in heterocedastic problems can outperform unrestricted quadratic rules, even in some cases where the model assumed by these rules is not true. These restrictions take into account the particular nature of interval data, where observations are defined by both MidPoints and Ranges, which may or may not be correlated. Under homocedastic conditions linear Gaussian rules are often the best rules, but distance-based methods may perform better in very specific conditions.

[1]  Jean-Paul Rasson,et al.  Unsupervised Divisive Classification , 2008 .

[2]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..

[3]  Donato Malerba,et al.  Classification of symbolic objects: A lazy learning approach , 2006, Intell. Data Anal..

[4]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[5]  Francisco de A. T. de Carvalho,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010, Comput. Stat. Data Anal..

[6]  F. Coolen,et al.  Interval-valued regression and classication models in the framework of machine learning , 2011 .

[7]  Paula Brito,et al.  Linear discriminant analysis for interval data , 2006, Comput. Stat..

[8]  Sidney Marks,et al.  Discriminant Functions When Covariance Matrices are Unequal , 1974 .

[9]  Davide Anguita,et al.  Interval discriminant analysis using support vector machines , 2007, ESANN.

[10]  A. Azzalini,et al.  Statistical applications of the multivariate skew normal distribution , 2009, 0911.2093.

[11]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[12]  Chenyi Hu,et al.  On interval weighted three-layer neural networks , 1998, Proceedings 31st Annual Simulation Symposium.

[13]  Bernhard W. Flury,et al.  Error rates in quadratic discrimination with constraints on the covariance matrices , 1994 .

[14]  A. Azzalini,et al.  The multivariate skew-normal distribution , 1996 .

[15]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[16]  Hisao Ishibuchi,et al.  DISCRIMINANT ANALYSIS OF MULTI-DIMENSIONAL INTERVAL DATA AND ITS APPLICATION TO CHEMICAL SENSING , 1990 .

[17]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[18]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[19]  Philippe Nivlet,et al.  Interval Discriminant Analysis: An Efficient Method to Integrate Errors In Supervised Pattern Recognition , 2001, ISIPTA.

[20]  Jirí Síma,et al.  Neural expert systems , 1995, Neural Networks.

[21]  R. Arellano-Valle,et al.  The centred parametrization for the multivariate skew-normal distribution , 2008 .

[22]  Francesco Palumbo,et al.  Principal Component Analysis for Non-Precise Data , 2005 .

[23]  F. Hosseinzadeh Lotfi,et al.  Discriminant analysis of interval data using Monte Carlo method in assessment of overlap , 2007, Appl. Math. Comput..

[24]  Fabrice Rossi,et al.  Multi-layer Perceptron on Interval Data ? , 2002 .

[25]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[26]  F. Plastria,et al.  Classification problems with imprecise data through separating hyperplanes , 2007 .

[27]  S. J. Simoff Handling uncertainty in neural networks: an interval approach , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).