Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity

In this paper, a dynamic over-sampling procedure is proposed to improve the classification of imbalanced datasets with more than two classes. This procedure is incorporated into a Hybrid algorithm (HA) that optimizes Multi Layer Perceptron Neural Networks (MLPs). To handle class imbalance, the training dataset is resampled in two stages. In the first stage, an over-sampling procedure is applied to the minority class to partially balance the size of the classes. In the second, the HA is run and the dataset is over-sampled in different generations of the evolution, generating new patterns in the minimum sensitivity class (the class with the worst accuracy for the best MLP of the population). To evaluate the efficiency of our technique, we pose a complex problem, the classification of 1617 real farms into three classes (efficient, intermediate and inefficient) according to the Relative Technical Efficiency (RTE) obtained by the Monte Carlo Data Envelopment Analysis (MC-DEA). The multi-classification model, named Dynamic Smote Hybrid Multi Layer Perceptron (DSHMLP) is compared to other standard classification methods with an over-sampling procedure in the preprocessing stage and to the threshold-moving method where the output threshold is moved toward inexpensive classes. The results show that our proposal is able to improve minimum sensitivity in the generalization set (35.00%) and obtains a high accuracy level (72.63%).

[1]  Peter J. Angeline,et al.  An evolutionary algorithm that constructs recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[2]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Pedro Antonio Gutiérrez,et al.  Evolutionary learning by a sensitivity-accuracy approach for multi-class problems , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[4]  Michael J. Pazzani,et al.  Reducing Misclassification Costs , 1994, ICML.

[5]  Pedro Antonio Gutiérrez,et al.  Development of a multi-classification neural network model to determine the microbial growth/no growth interface. , 2010, International journal of food microbiology.

[6]  Pablo Moscato,et al.  A Gentle Introduction to Memetic Algorithms , 2003, Handbook of Metaheuristics.

[7]  Mehdi Toloo,et al.  A new method for ranking discovered rules from data mining by DEA , 2009, Expert Syst. Appl..

[8]  Tyrone T. Lin,et al.  Application of DEA in analyzing a bank's operating performance , 2009, Expert Syst. Appl..

[9]  Rupert G. Miller Simultaneous Statistical Inference , 1966 .

[10]  Lutz Prechelt,et al.  PROBEN 1 - a set of benchmarks and benchmarking rules for neural network training algorithms , 1994 .

[11]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[12]  D. Fogel Evolutionary algorithms in theory and practice , 1997, Complex..

[13]  Hao-Chen Huang,et al.  Rating the relative efficiency of financial holding companies in an emerging economy: A multiple DEA approach , 2009, Expert Syst. Appl..

[14]  Ming-Fu Hsu,et al.  A hybrid approach of DEA, rough set and support vector machines for business failure prediction , 2010, Expert systems with applications.

[15]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[16]  Pedro Antonio Gutiérrez,et al.  Evolutionary q-Gaussian Radial Basis Function Neural Network to determine the microbial growth/no growth interface of Staphylococcus aureus , 2011, Appl. Soft Comput..

[17]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[18]  Ingo Rechenberg,et al.  Evolutionsstrategie : Optimierung technischer Systeme nach Prinzipien der biologischen Evolution , 1973 .

[19]  Christian Igel,et al.  Empirical evaluation of the improved Rprop learning algorithms , 2003, Neurocomputing.

[20]  Shu-Ping Lin,et al.  The consumer loan default predicting model - An application of DEA-DA and neural network , 2009, Expert Syst. Appl..

[21]  L. Darrell Whitley,et al.  Lamarckian Evolution, The Baldwin Effect and Function Optimization , 1994, PPSN.

[22]  Pedro Antonio Gutiérrez,et al.  Evolutionary q-Gaussian radial basis function neural networks for multiclassification , 2011, Neural Networks.

[23]  Xin Yao,et al.  A new evolutionary system for evolving artificial neural networks , 1997, IEEE Trans. Neural Networks.

[24]  De-Shuang Huang,et al.  A mended hybrid learning algorithm for radial basis function neural networks to improve generalization capability , 2007 .

[25]  Stan Matwin,et al.  Addressing the Curse of Imbalanced Training Sets: One-Sided Selection , 1997, ICML.

[26]  Keith E. Mathias,et al.  In Parallel Problem Solving from Nature-PPSN III , 1994 .

[27]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[28]  Pedro Antonio Gutiérrez,et al.  A dynamic over-sampling procedure based on sensitivity for multi-class problems , 2011, Pattern Recognit..

[29]  Pedro Antonio Gutiérrez,et al.  Combined projection and kernel basis functions for classification in evolutionary neural networks , 2009, Neurocomputing.

[30]  A. C. Martínez-Estudillo,et al.  Hybridization of evolutionary algorithms and local search by means of a clustering method , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[31]  César Hervás-Martínez,et al.  Multinomial logistic regression and product unit neural network models: Application of a new hybrid methodology for solving a classification problem in the livestock sector , 2009, Expert Syst. Appl..

[32]  María José del Jesús,et al.  On the influence of an adaptive inference system in fuzzy rule based classification systems for imbalanced data-sets , 2009, Expert Syst. Appl..

[33]  F. J. Martı́nez-Estudilloa,et al.  Evolutionary product-unit neural networks classifiers , 2008 .

[34]  María José del Jesús,et al.  Improving the Performance of Fuzzy Rule Based Classification Systems for Highly Imbalanced Data-Sets Using an Evolutionary Adaptive Inference System , 2009, IWANN.

[35]  W. Vent,et al.  Rechenberg, Ingo, Evolutionsstrategie — Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. 170 S. mit 36 Abb. Frommann‐Holzboog‐Verlag. Stuttgart 1973. Broschiert , 1975 .

[36]  Yongjun Li,et al.  Increasing the discriminatory power of DEA in the presence of the undesirable outputs and large dimensionality of data sets with PCA , 2009, Expert Syst. Appl..

[37]  Desheng Dash Wu,et al.  Supplier selection: A hybrid model using DEA, decision tree and neural network , 2009, Expert Syst. Appl..

[38]  Ali Emrouznejad,et al.  Selecting the most preferable alternatives in a group decision making problem using DEA , 2009, Expert Syst. Appl..

[39]  Zhi-Hua Zhou,et al.  Ieee Transactions on Knowledge and Data Engineering 1 Training Cost-sensitive Neural Networks with Methods Addressing the Class Imbalance Problem , 2022 .