Lasso-constrained regression analysis for interval-valued data

A new method of regression analysis for interval-valued data is proposed. The relationship between an interval-valued response variable and a set of interval-valued explanatory variables is investigated by considering two regression models, one for the midpoints and the other one for the radii. The estimation problem is approached by introducing Lasso-based constraints on the regression coefficients. This can improve the prediction accuracy of the model and, taking into account the nature of the constraints, can sometimes produce a parsimonious model with a common subset of regression coefficients for the midpoint and the radius models. The effectiveness of our method, called Lasso-IR (Lasso-based Interval-valued Regression), is shown by a simulation experiment and some applications to real data.

[1]  Ana Colubi,et al.  Confidence sets in a linear regression model for interval data , 2012 .

[2]  Ana Colubi,et al.  Least squares estimation of linear regression models for convex compact random sets , 2007, Adv. Data Anal. Classif..

[3]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[4]  Yongho Jeon,et al.  A resampling approach for interval‐valued data regression , 2012, Stat. Anal. Data Min..

[5]  Renata M. C. R. de Souza,et al.  A robust method for linear regression of symbolic interval data , 2010, Pattern Recognit. Lett..

[6]  Monique Noirhomme-Fraiture,et al.  Symbolic Data Analysis and the SODAS Software , 2008 .

[7]  P. Groenen,et al.  Data analysis, classification, and related methods , 2000 .

[8]  R. Tibshirani,et al.  An introduction to the bootstrap , 1993 .

[9]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[10]  Gil González-Rodríguez,et al.  Estimation of a flexible simple linear model for interval data based on set arithmetic , 2011, Comput. Stat. Data Anal..

[11]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[12]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[13]  A T de CarvalhoFrancisco de,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010 .

[14]  Philip E. Gill,et al.  Practical optimization , 1981 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..

[17]  Francisco de A. T. de Carvalho,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010, Comput. Stat. Data Anal..

[18]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data: Exploratory Methods for Extracting Statistical Information from Complex Data , 2000 .

[19]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[20]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[21]  Ana Colubi,et al.  A new family of metrics for compact, convex (fuzzy) sets based on a generalized concept of mid and spread , 2009, Inf. Sci..

[22]  Hans-Hermann Bock,et al.  Analysis of Symbolic Data , 2000 .