A Nonparametric Kernel Approach to Interval-Valued Data Analysis

This article concerns datasets in which variables are in the form of intervals, which are obtained by aggregating information about variables from a larger dataset. We propose to view the observed set of hyper-rectangles as an empirical histogram, and to use a Gaussian kernel type estimator to approximate its underlying distribution in a nonparametric way. We apply this idea to both univariate density estimation and regression problems. Unlike many existing methods used in regression analysis, the proposed method can estimate the conditional distribution of the response variable for any given set of predictors even when some of them are not interval-valued. Empirical studies show that the proposed approach has a great flexibility in various scenarios with complex relationships between the location and width of intervals of the response and predictor variables.

[1]  Ana Colubi,et al.  A set arithmetic-based linear regression model for modelling interval-valued responses through real-valued variables , 2013, Inf. Sci..

[2]  Ana Colubi,et al.  Interval arithmetic-based simple linear regression between interval data: Discussion and sensitivity analysis on the choice of the metric , 2012, Inf. Sci..

[3]  Francisco de A. T. de Carvalho,et al.  Bivariate Generalized Linear Model for Interval-Valued Variables , 2009, 2009 International Joint Conference on Neural Networks.

[4]  Fabrizio Cipollini,et al.  Semiparametric Vector MEM , 2008 .

[5]  Fabrizio Cipollini,et al.  SEMIPARAMETRIC VECTOR MEM , 2013 .

[6]  Lynne Billard,et al.  Dependencies and Variation Components of Symbolic Interval-Valued Data , 2007 .

[7]  Huaiyu Zhu On Information and Sufficiency , 1997 .

[8]  Silva,et al.  A Regression Model to Interval-valued Variables based on Copula Approach , 2011 .

[9]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[10]  Edwin Diday,et al.  Symbolic Data Analysis: A Mathematical Framework and Tool for Data Mining , 1999, Electron. Notes Discret. Math..

[11]  Gil González-Rodríguez,et al.  Estimation of a flexible simple linear model for interval data based on set arithmetic , 2011, Comput. Stat. Data Anal..

[12]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[13]  G. Cordeiro,et al.  Bivariate symbolic regression models for interval-valued variables , 2011 .

[14]  Peter L. Brooks,et al.  Visualizing data , 1997 .

[15]  L. Billard,et al.  Symbolic Regression Analysis , 2002 .

[16]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[17]  Ana Colubi,et al.  Least squares estimation of linear regression models for convex compact random sets , 2007, Adv. Data Anal. Classif..

[18]  Yongho Jeon A Characterization of the Log-Density Smoothing Spline ANOVA Model , 2012 .

[19]  Ana Colubi,et al.  Confidence sets in a linear regression model for interval data , 2012 .

[20]  Yongho Jeon,et al.  A resampling approach for interval‐valued data regression , 2012, Stat. Anal. Data Min..

[21]  Chong Gu Smoothing Spline Anova Models , 2002 .

[22]  Paolo Giordani,et al.  Lasso-constrained regression analysis for interval-valued data , 2015, Adv. Data Anal. Classif..