Constrained linear regression models for symbolic interval-valued variables

This paper introduces an approach to fitting a constrained linear regression model to interval-valued data. Each example of the learning set is described by a feature vector for which each feature value is an interval. The new approach fits a constrained linear regression model on the midpoints and range of the interval values assumed by the variables in the learning set. The prediction of the lower and upper boundaries of the interval value of the dependent variable is accomplished from its midpoint and range, which are estimated from the fitted linear regression models applied to the midpoint and range of each interval value of the independent variables. This new method shows the importance of range information in prediction performance as well as the use of inequality constraints to ensure mathematical coherence between the predicted values of the lower ([email protected]?"L"i) and upper ([email protected]?"U"i) boundaries of the interval. The authors also propose an expression for the goodness-of-fit measure denominated determination coefficient. The assessment of the proposed prediction method is based on the estimation of the average behavior of the root-mean-square error and square of the correlation coefficient in the framework of a Monte Carlo experiment with different data set configurations. Among other aspects, the synthetic data sets take into account the dependence, or lack thereof, between the midpoint and range of the intervals. The bias produced by the use of inequality constraints over the vector of parameters is also examined in terms of the mean-square error of the parameter estimates. Finally, the approaches proposed in this paper are applied to a real data set and performances are compared.

[1]  P. Bertrand,et al.  Descriptive Statistics for Symbolic Data , 2000 .

[2]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .

[3]  Francesco Palumbo,et al.  Principal component analysis of interval data: a symbolic data analysis approach , 2000, Comput. Stat..

[4]  Yves Lechevallier,et al.  Adaptive Hausdorff distances and dynamic clustering of symbolic interval data , 2006, Pattern Recognit. Lett..

[5]  Francisco de A. T. de Carvalho,et al.  Fuzzy c-means clustering methods for symbolic interval data , 2007, Pattern Recognit. Lett..

[6]  Francisco de A. T. de Carvalho,et al.  Forecasting models for interval-valued time series , 2008, Neurocomputing.

[7]  Marie Chavent,et al.  A monothetic clustering method , 1998, Pattern Recognit. Lett..

[8]  D. S. Guru,et al.  Multivalued type dissimilarity measure and concept of mutual dissimilarity value for clustering symbolic patterns , 2005, Pattern Recognit..

[9]  Hans-Hermann Bock,et al.  Classification, Clustering, and Data Analysis , 2002 .

[10]  Edwin Diday,et al.  Symbolic clustering using a new dissimilarity measure , 1991, Pattern Recognit..

[11]  P. Nagabhushan,et al.  Multivalued type proximity measure and concept of mutual similarity value useful for clustering symbolic patterns , 2004, Pattern Recognit. Lett..

[12]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[13]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[14]  Christopher S. McIntosh,et al.  Imposing inequality restrictions: efficiency gains from economic theory , 2001 .

[15]  Manabu Ichino,et al.  A Fuzzy Symbolic Pattern Classifier , 1996 .

[16]  Jean-Paul Rasson,et al.  Symbolic Kernel Discriminant Analysis , 2000 .

[17]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[18]  C. K. Liew,et al.  Inequality Constrained Least-Squares Estimation , 1976 .

[19]  Francisco de A. T. de Carvalho,et al.  Clustering of interval data based on city-block distances , 2004, Pattern Recognit. Lett..

[20]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[21]  Otto Opitz,et al.  Ordinal and Symbolic Data Analysis , 1996 .

[22]  K. Chidananda Gowda,et al.  Symbolic clustering using a new similarity measure , 1992, IEEE Trans. Syst. Man Cybern..

[23]  Manabu Ichino,et al.  Generalized Minkowski metrics for mixed feature-type data analysis , 1994, IEEE Trans. Syst. Man Cybern..

[24]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[25]  Rosanna Verde,et al.  Non-symmetrical factorial discriminant analysis for symbolic objects , 1999 .

[26]  Edward C. Prescott,et al.  Multiple Regression with Inequality Constraints: Pretesting Bias, Hypothesis Testing and Efficiency , 1970 .

[27]  H. Scheffé,et al.  The Analysis of Variance , 1960 .

[28]  G. Judge,et al.  Inequality Restrictions in Regression Analysis , 1966 .

[29]  F. A. T. de Carvalho Histograms in symbolic data analysis , 1995, Ann. Oper. Res..

[30]  Elizabeth A. Peck,et al.  Introduction to Linear Regression Analysis , 2001 .

[31]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[32]  Hans-Hermann Bock CLUSTERING ALGORITHMS AND KOHONEN MAPS FOR SYMBOLIC DATA(Symbolic Data Analysis) , 2003 .

[33]  Takayuki Saito,et al.  CIRCLE STRUCTURE DERIVED FROM DECOMPOSITION OF ASYMMETRIC DATA MATRIX , 2002 .

[34]  Edwin Diday,et al.  I-Scal: Multidimensional scaling of interval dissimilarities , 2006, Comput. Stat. Data Anal..