Centre and Range method for fitting a linear regression model to symbolic interval data

This paper introduces a new approach to fitting a linear regression model to symbolic interval data. Each example of the learning set is described by a feature vector, for which each feature value is an interval. The new method fits a linear regression model on the mid-points and ranges of the interval values assumed by the variables in the learning set. The prediction of the lower and upper bounds of the interval value of the dependent variable is accomplished from its mid-point and range, which are estimated from the fitted linear regression model applied to the mid-point and range of each interval value of the independent variables. The assessment of the proposed prediction method is based on the estimation of the average behaviour of both the root mean square error and the square of the correlation coefficient in the framework of a Monte Carlo experiment. Finally, the approaches presented in this paper are applied to a real data set and their performance is compared.

[1]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[2]  Francisco de A. T. de Carvalho,et al.  Clustering of interval data based on city-block distances , 2004, Pattern Recognit. Lett..

[3]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[4]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[5]  Manabu Ichino,et al.  A Fuzzy Symbolic Pattern Classifier , 1996 .

[6]  E. Diday,et al.  Extension de l'analyse en composantes principales à des données de type intervalle , 1997 .

[7]  Marie Chavent,et al.  A monothetic clustering method , 1998, Pattern Recognit. Lett..

[8]  Hans-Hermann Bock,et al.  Classification, Clustering, and Data Analysis , 2002 .

[9]  P. Nagabhushan,et al.  Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns , 2004, Pattern Recognit. Lett..

[10]  J. Brian Gray,et al.  Introduction to Linear Regression Analysis , 2002, Technometrics.

[11]  Rosanna Verde,et al.  Non-symmetrical factorial discriminant analysis for symbolic objects , 1999 .

[12]  Hans-Hermann Bock CLUSTERING ALGORITHMS AND KOHONEN MAPS FOR SYMBOLIC DATA(Symbolic Data Analysis) , 2003 .

[13]  Richard F. Gunst,et al.  Applied Regression Analysis , 1999, Technometrics.

[14]  F. A. T. de Carvalho Histograms in symbolic data analysis , 1995, Ann. Oper. Res..

[15]  Jean-Paul Rasson,et al.  Symbolic Kernel Discriminant Analysis , 2000 .

[16]  ScienceDirect Computational statistics & data analysis , 1983 .

[17]  K. Chidananda Gowda,et al.  Symbolic clustering using a new similarity measure , 1992, IEEE Trans. Syst. Man Cybern..

[18]  Manabu Ichino,et al.  Generalized Minkowski metrics for mixed feature-type data analysis , 1994, IEEE Trans. Syst. Man Cybern..

[19]  Edwin Diday,et al.  Symbolic clustering using a new dissimilarity measure , 1991, Pattern Recognit..

[20]  L. Billard,et al.  Symbolic Regression Analysis , 2002 .

[21]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[22]  Francesco Palumbo,et al.  Principal component analysis of interval data: a symbolic data analysis approach , 2000, Comput. Stat..

[23]  Yves Lechevallier,et al.  Adaptative Hausdorff Distances and Dynamic Clustering of Symbolic Interval Data , 2017 .