Off the beaten track: A new linear model for interval data

We propose a new linear regression model for interval-valued variables. The model uses quantile functions to represent the intervals, thereby considering the distributions within them. In this paper we study the special case where the Uniform distribution is assumed in each observed interval, and we analyze the extension to the Symmetric Triangular distribution. The parameters of the model are obtained solving a constrained quadratic optimization problem that uses the Mallows distance between quantile functions. As in the classical case, a goodness-of-fit measure is deduced. Two applications on up-to-date fields are presented: one predicting duration of unemployment and the other allowing forecasting burned area by forest fires.

[1]  Antonio Irpino,et al.  Basic statistics for distributional symbolic variables: a new metric-based approach , 2011, Advances in Data Analysis and Classification.

[2]  Jerónimo Aznar-Bellver,et al.  Estimating Regression Parameters with Imprecise Input Data in an Appraisal Context , 2007, Eur. J. Oper. Res..

[3]  David Abend Analysis Of Symbolic Data Exploratory Methods For Extracting Statistical Information From Complex Data , 2016 .

[4]  Carlo Bertoluzza,et al.  On a new class of distances between fuzzy numbers , 1995 .

[5]  Ana Colubi,et al.  Least squares estimation of linear regression models for convex compact random sets , 2007, Adv. Data Anal. Classif..

[6]  Vera Hofer,et al.  Adapting a classification rule to local and global shift when only unlabelled data are available , 2015, Eur. J. Oper. Res..

[7]  Edwin Diday,et al.  Symbolic Data Analysis: Conceptual Statistics and Data Mining (Wiley Series in Computational Statistics) , 2007 .

[8]  Paula Brito,et al.  Linear regression model with histogram‐valued variables , 2015, Stat. Anal. Data Min..

[9]  L. Billard,et al.  Symbolic Regression Analysis , 2002 .

[10]  Monique Noirhomme-Fraiture,et al.  Far beyond the classical data models: symbolic data analysis , 2011, Stat. Anal. Data Min..

[11]  C. Mallows A Note on Asymptotic Joint Normality , 1972 .

[12]  Michel Verleysen,et al.  Towards Advanced Data Analysis by Combining Soft Computing and Statistics , 2012, SOCO 2012.

[13]  Gil González-Rodríguez,et al.  Estimation of a flexible simple linear model for interval data based on set arithmetic , 2011, Comput. Stat. Data Anal..

[14]  M. Gil,et al.  Least squares fitting of an affine function and strength of association for interval-valued data , 2002 .

[15]  Witold Pedrycz,et al.  Allocation of information granularity in optimization and decision-making models: Towards building the foundations of Granular Computing , 2014, Eur. J. Oper. Res..

[16]  S. Dias Linear regression with empirical distributions , 2014 .

[17]  Francisco de A. T. de Carvalho,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010, Comput. Stat. Data Anal..

[18]  G. Cordeiro,et al.  Bivariate symbolic regression models for interval-valued variables , 2011 .

[19]  P. Cortez,et al.  A data mining approach to predict forest fires using meteorological data , 2007 .

[20]  Paula Brito,et al.  Symbolic Data Analysis: another look at the interaction of Data Mining and Statistics , 2014, WIREs Data Mining Knowl. Discov..

[21]  Antonio Irpino,et al.  A New Wasserstein Based Distance for the Hierarchical Clustering of Histogram Symbolic Data , 2006, Data Science and Classification.

[22]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[23]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[24]  Paolo Giordani,et al.  Lasso-constrained regression analysis for interval-valued data , 2015, Adv. Data Anal. Classif..

[25]  J. Arroyo,et al.  Forecasting histogram time series with k-nearest neighbours methods , 2009 .

[26]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[27]  Yongho Jeon,et al.  A resampling approach for interval‐valued data regression , 2012, Stat. Anal. Data Min..

[28]  Witold Pedrycz,et al.  Granular Computing: Analysis and Design of Intelligent Systems , 2013 .