An exponential-type kernel robust regression model for interval-valued variables

Abstract The presence of outliers is very common in regression problems and the use of robust regression methods is strongly recommended such that the bad fitted observations not affect the parameter estimates of the model. Interval-valued variables are becoming common in data analysis problems since this type of data represents either the uncertainty existing in an error measurement or the natural variability present in the data. Regarding the presence of outliers in interval-valued data sets, few robust regression methods have been proposed in literature. This paper introduces a new robust regression method for interval-valued variables that penalizes the presence of outliers in the midpoints and/or in the ranges of interval-valued observations through the use of exponential-type kernel functions. Thus, the weight given to the midpoint and range of each interval-valued observation is updated at each iteration in order to optimize a suitable objective function. The convergence of the parameter estimation algorithm is guaranteed with a low computational cost. A comparative study between the proposed method against some previous robust regression approaches for interval-valued variables is also considered. The performance of these methods are evaluated based on the bias and mean squared error (MSE) of the parameter estimates for the midpoints and ranges of the intervals, considering synthetic data sets with X-space outliers, Y-space outliers and leverage points, different sample sizes and percentage of outliers in a Monte Carlo framework. The results suggest that the proposed approach presents a competitive performance (or best), in comparison with the previous approaches, on interval-valued outliers scenarios that are comparable to those found in practices. Applications to real interval-valued data sets corroborates the usefulness of the proposed method.

[1]  Phil Diamond,et al.  Fuzzy least squares , 1988, Inf. Sci..

[2]  J. Mercer Functions of Positive and Negative Type, and their Connection with the Theory of Integral Equations , 1909 .

[3]  Yongho Jeon,et al.  A resampling approach for interval‐valued data regression , 2012, Stat. Anal. Data Min..

[4]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[5]  Renata M. C. R. de Souza,et al.  A robust method for linear regression of symbolic interval data , 2010, Pattern Recognit. Lett..

[6]  Ruoning Xu,et al.  Multidimensional least-squares fitting with a fuzzy model , 2001, Fuzzy Sets Syst..

[7]  Georg Peters Fuzzy linear regression with fuzzy intervals , 1994 .

[8]  Francisco de A. T. de Carvalho,et al.  A robust regression method based on exponential-type kernel functions , 2017, Neurocomputing.

[9]  Renata M. C. R. de Souza,et al.  Robust regression with application to symbolic interval data , 2013, Eng. Appl. Artif. Intell..

[10]  Francisco de A. T. de Carvalho,et al.  Centre and Range method for fitting a linear regression model to symbolic interval data , 2008, Comput. Stat. Data Anal..

[11]  Shitong Wang,et al.  Dependency between degree of fit and input noise in fuzzy linear regression using non-symmetric fuzzy triangular coefficients , 2007, Fuzzy Sets Syst..

[12]  P. Brito,et al.  Modelling interval data with Normal and Skew-Normal distributions , 2012 .

[13]  Ebrahim Nasrabadi,et al.  Fuzzy linear regression models with least square errors , 2005, Appl. Math. Comput..

[14]  V. Yohai,et al.  Robust Statistics: Theory and Methods , 2006 .

[15]  Renata M. C. R. de Souza,et al.  A weighted multivariate Fuzzy C-Means method in interval-valued scientific production data , 2014, Expert Syst. Appl..

[16]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[17]  Ebrahim Nasrabadi,et al.  Robust Fuzzy Regression Analysis Using Neural Networks , 2008, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[18]  Peng Hao,et al.  Constrained center and range joint model for interval-valued symbolic data regression , 2017, Comput. Stat. Data Anal..

[19]  Clifford M. Hurvich,et al.  Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion , 1998 .

[20]  L. Billard,et al.  Regression Analysis for Interval-Valued Data , 2000 .

[21]  Changwon Lim,et al.  Interval-valued data regression using nonparametric additive models , 2016 .

[22]  Gunnar Rätsch,et al.  An introduction to kernel-based learning algorithms , 2001, IEEE Trans. Neural Networks.

[23]  Renata M. C. R. de Souza,et al.  Interval kernel regression , 2014, Neurocomputing.

[24]  L. Billard,et al.  From the Statistics of Data to the Statistics of Knowledge , 2003 .

[25]  James J. Buckley,et al.  Fuzzy regression using least absolute deviation estimators , 2007, Soft Comput..

[26]  Yuan Wei,et al.  Interval-valued data regression using partial linear model , 2017 .

[27]  Paolo Giordani,et al.  Lasso-constrained regression analysis for interval-valued data , 2015, Adv. Data Anal. Classif..

[28]  V. Yohai HIGH BREAKDOWN-POINT AND HIGH EFFICIENCY ROBUST ESTIMATES FOR REGRESSION , 1987 .

[29]  Pierpaolo D'Urso,et al.  A least-squares approach to fuzzy linear regression analysis , 2000 .

[30]  Eufrásio de Andrade Lima Neto,et al.  Regression model for interval-valued variables based on copulas , 2015 .

[31]  Pierpaolo D'Urso,et al.  Least squares estimation of a linear regression model with LR fuzzy response , 2006, Comput. Stat. Data Anal..

[32]  G. González-Rivera,et al.  Constrained Regression for Interval-Valued Data , 2013 .

[33]  Pierpaolo D’Urso,et al.  Weighted Least Squares and Least Median Squares estimation for the fuzzy linear regression analysis , 2013 .

[34]  Renata M. C. R. de Souza,et al.  Quantile regression of interval-valued data , 2016, 2016 23rd International Conference on Pattern Recognition (ICPR).

[35]  Francisco de A. T. de Carvalho,et al.  Constrained linear regression models for symbolic interval-valued variables , 2010, Comput. Stat. Data Anal..

[36]  G. Cordeiro,et al.  Bivariate symbolic regression models for interval-valued variables , 2011 .

[37]  Pierpaolo D'Urso,et al.  Linear regression analysis for fuzzy = crisp input and fuzzy = crisp output data , 2015 .

[38]  Seung-Hoe Choi,et al.  LEAST ABSOLUTE DEVIATION ESTIMATOR IN FUZZY REGRESSION , .

[39]  Miin-Shen Yang,et al.  Fuzzy least-squares linear regression analysis for fuzzy input-output data , 2002, Fuzzy Sets Syst..

[40]  Zhi-gang Su,et al.  Parameter estimation from interval-valued data using the expectation-maximization algorithm , 2015 .

[41]  Paula Brito,et al.  Off the beaten track: A new linear model for interval data , 2017, Eur. J. Oper. Res..

[42]  Yongho Jeon,et al.  A Nonparametric Kernel Approach to Interval-Valued Data Analysis , 2015, Technometrics.

[43]  Peter J. Rousseeuw,et al.  Robust regression and outlier detection , 1987 .

[44]  Ebrahim Nasrabadi,et al.  An LP-Based Approach to Outliers Detection in Fuzzy Regression Analysis , 2007, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[45]  Ping-Teng Chang,et al.  A generalized fuzzy weighted least-squares regression , 1996, Fuzzy Sets Syst..

[46]  Nello Cristianini,et al.  An introduction to Support Vector Machines , 2000 .