A Data-driven fuzzy modelling framework for the classification of imbalanced data

The design and implementation of Data-Driven Fuzzy Models (DDFMs) to learn balanced industrial/manufacturing data has demonstrated to be a popular machine learning methodology. However, DDFMs have also proven to perform poorly when it comes to learn from heavily imbalanced data. In this study we propose a DDFM to tackle the challenge of a two-class imbalanced case study for rail quality. We integrate a number of machine learning methods, namely: Granular Computing (GrC), RBF Neural Networks (RBF-NN), Feature Selection (FS) to create a DDFM framework which is sensitive to imbalanced data. The rationale behind the DDFM framework can be described into three main stages: in the first stage, a Fast Correlation-Based Filter (FCBF) is employed to select the most representative features. Subsequently, the concept of iterative granulation is applied to group (cluster) the rail data set. Granulation provides a number of information granules which can be viewed as fuzzy constraints. In the second stage, an RBF-NN is used as a Neural Fuzzy Model (NFM) whose initial parameters are the parameters of the fuzzy sets created during the granulation process. Finally, a twofold bootstrapping strategy is performed. On the one hand, bootstrapping is used to balance the rate between the majority and minority class. On the other hand, bootstrapping estimates the appropriate number of fuzzy linguistic rules in the NFM. The proposed modelling framework is tested against a manufacturing case study provided by TATA Steel, UK.

[1]  Tim Hesterberg,et al.  Bootstrap Methods and Permutation Tests* 14.1 the Bootstrap Idea 14.2 First Steps in Using the Bootstrap 14.3 How Accurate Is a Bootstrap Distribution? 14.4 Bootstrap Confidence Intervals 14.5 Significance Testing Using Permutation Tests Introduction , 2004 .

[2]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[3]  George Panoutsos,et al.  Development of a parsimonious GA-NN ensemble model with a case study for Charpy impact energy prediction , 2011, Adv. Eng. Softw..

[4]  Nathan F. Lepora,et al.  Active haptic shape recognition by intrinsic motivation with a robot hand , 2015, 2015 IEEE World Haptics Conference (WHC).

[5]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[6]  Chuen-Tsai Sun,et al.  Functional equivalence between radial basis function networks and fuzzy inference systems , 1993, IEEE Trans. Neural Networks.

[7]  John Q. Gan,et al.  Low-level interpretability and high-level interpretability: a unified view of data-driven interpretable fuzzy system modelling , 2008, Fuzzy Sets Syst..

[8]  Mahdi Mahfouf,et al.  Support Vector Machines for Class Imbalance Rail Data Classification with Bootstrapping-based Over-Sampling and Under-Sampling , 2014 .

[9]  George Panoutsos,et al.  Interval Type-2 Radial Basis Function Neural Network: A Modeling Framework , 2015, IEEE Transactions on Fuzzy Systems.

[10]  Fuzzy Logic = Computing with Words - Fuzzy Systems, IEEE Transactions on , 2009 .

[11]  A. Saah,et al.  Sensitivity and Specificity Reconsidered: The Meaning of These Terms in Analytical and Diagnostic Settings , 1997, Annals of Internal Medicine.

[12]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[13]  George Panoutsos,et al.  Granular computing neural-fuzzy modelling: A neutrosophic approach , 2013, Appl. Soft Comput..

[14]  George Panoutsos,et al.  Iterative information granulation for novelty detection in complex datasets , 2016, 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[15]  Andrzej Bargiela,et al.  Granular clustering: a granular signature of data , 2002, IEEE Trans. Syst. Man Cybern. Part B.

[16]  Tony J. Dodd,et al.  Active Bayesian perception for angle and position discrimination with a biomimetic fingertip , 2013, 2013 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[17]  George Panoutsos,et al.  Adaptive neural-fuzzy inference system for classification of rail quality data with bootstrapping-based over-sampling , 2011, 2011 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2011).

[18]  George Panoutsos,et al.  Fuzzy uncertainty assessment in RBF Neural Networks using neutrosophic sets for multiclass classification , 2014, 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).