Robust weighted regression via PAELLA sample weights

Abstract This paper reports the usage of the occurrence vector provided by the PAELLA algorithm in the context of robust regression. PAELLA was originally conceived as an outlier detection and data cleaning technique. A novel approach is to use this algorithm not for discarding outliers but to generate information related to the reliability of the observations recorded in the dataset. This approach proves to provide successful results when compared to traditional common practice such as outlier removal. A set of experiments using a contrived difficult artificial dataset are described using both neural networks and classical polynomial fitting. Finally, a successful comparison of our approach to two state-of-the-art algorithms proves the benefits of using the PAELLA algorithm in the context of robust regression.

[1]  Héctor Alaiz-Moretón,et al.  PAELLA as a Booster in Weighted Regression , 2017, SOCO-CISIS-ICEUTE.

[2]  E Salazar-Ruiz,et al.  メキシカリ,バヤカリフォルニア(メキシコ)とカレキシコ,カリフォルニア(アメリカ)における直線と人工知能モデルを用いて対流圏オゾン予測モデルの開発と比較分析 , 2008 .

[3]  Joaquín B. Ordieres Meré,et al.  Development and comparative analysis of tropospheric ozone prediction models using linear and artificial intelligence-based models in Mexicali, Baja California (Mexico) and Calexico, California (US) , 2008, Environ. Model. Softw..

[4]  Manuel Castejón Limas,et al.  Outlier Detection and Data Cleaning in Multivariate Non-Normal Samples: The PAELLA Algorithm , 2004, Data Mining and Knowledge Discovery.

[5]  Zhihan Lv,et al.  Bigdata Oriented Multimedia Mobile Health Applications , 2016, Journal of Medical Systems.

[6]  Ning Qian,et al.  On the momentum term in gradient descent learning algorithms , 1999, Neural Networks.

[7]  Francisco Ortega,et al.  Importance of information pre‐processing in the improvement of neural network results* , 1996 .

[8]  Joaquín B. Ordieres Meré,et al.  Neural network prediction model for fine particulate matter (PM2.5) on the US-Mexico border in El Paso (Texas) and Ciudad Juárez (Chihuahua) , 2005, Environ. Model. Softw..

[9]  Yuan-Hai Shao,et al.  An ε-twin support vector machine for regression , 2012, Neural Computing and Applications.

[10]  Lan Bai,et al.  Weighted Lagrange ε-twin support vector regression , 2016, Neurocomputing.

[11]  Manuel Castejón Limas,et al.  Development of neural network-based models to predict mechanical properties of hot dip galvanised steel coils , 2011, Int. J. Data Min. Model. Manag..

[12]  Theodore Johnson,et al.  Exploratory Data Mining and Data Cleaning , 2003 .

[13]  J. Ordieres,et al.  Intelligent methods helping the design of a manufacturing system for die extrusion rubbers , 2003, Int. J. Comput. Integr. Manuf..

[14]  Beata Walczak,et al.  Neural networks with robust backpropagation learning algorithm , 1996 .

[15]  Joaquín B. Ordieres Meré,et al.  Prediction of daily maximum ozone threshold exceedances by preprocessing and ensemble artificial intelligence techniques , 2016 .

[16]  Ana González-Marcos,et al.  Comparison of models created for the prediction of the mechanical properties of galvanized steel coils , 2010, Journal of Intelligent Manufacturing.

[17]  Joaquín B. Ordieres Meré,et al.  Prediction models for ozone in metropolitan area of Mexico City based on artificial intelligence techniques , 2015, Int. J. Inf. Decis. Sci..

[18]  Héctor Alaiz-Moretón,et al.  Coupling the PAELLA Algorithm to Predictive Models , 2017, SOCO-CISIS-ICEUTE.