Adaptive neural-fuzzy inference system for classification of rail quality data with bootstrapping-based over-sampling

An iterative bootstrapping-based data over-sampling strategy is presented in this paper together with an adaptive neural-fuzzy inference system (ANFIS) to deal with a severely imbalanced data modelling problem. As real industrial data are often very large, containing hundreds of process variables and a huge number of data records, the selection of a compact set of input variables becomes critical for any successful modelling and analysis operations. Significant efforts have been devoted to identifying the most relevant input variables through correlation analysis and neural network based forward input selection. An optimal majority to minority class data ratio, which controls the level of data imbalance for model training, is then determined through the iterative bootstrapping process such that the combined sensitivity and specificity performance is optimised. The iterative bootstrapping ANFIS modelling strategy is then applied to a real industrial case study for rail quality classification, with the original data being provided by Tata Steel Europe. Preliminary results show a good overall performance through the iterative bootstrapping data over-sampling ANFIS modelling.

[1]  Tim Hesterberg,et al.  Bootstrap Methods and Permutation Tests* 14.1 the Bootstrap Idea 14.2 First Steps in Using the Bootstrap 14.3 How Accurate Is a Bootstrap Distribution? 14.4 Bootstrap Confidence Intervals 14.5 Significance Testing Using Permutation Tests Introduction , 2004 .

[2]  Mahdi Mahfouf,et al.  A hierarchical Mamdani-type fuzzy modelling approach with new training data selection and multi-objective optimisation mechanisms: A special application for the prediction of mechanical properties of alloy steels , 2011, Appl. Soft Comput..

[3]  Derek A. Linkens,et al.  Fuzzy Model-based Charpy Impact Toughness Assessment for Ship Steels , 2004 .

[4]  Hisao Ishibuchi,et al.  Fuzzy data mining: effect of fuzzy discretization , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[6]  Maria Paula da Costa Couto Review of input determination techniques for neural network models based on mutual information and genetic algorithms , 2009, Neural Computing and Applications.

[7]  Kaizhu Huang,et al.  Learning classifiers from imbalanced data based on biased minimax probability machine , 2004, CVPR 2004.

[8]  Gongping Yang,et al.  On the Class Imbalance Problem , 2008, 2008 Fourth International Conference on Natural Computation.

[9]  Mahdi Mahfouf,et al.  A modified PSO with a dynamically varying population and its application to the multi-objective optimal design of alloy steels , 2009, 2009 IEEE Congress on Evolutionary Computation.

[10]  Mahdi Mahfouf,et al.  A nature-inspired multi-objective optimisation strategy based on a new reduced space searching algorithm for the design of alloy steels , 2010, Eng. Appl. Artif. Intell..

[11]  M. Mahfouf,et al.  Modeling and Optimal Design of Machining-Induced Residual Stresses in Aluminium Alloys Using a Fast Hierarchical Multiobjective Optimization Algorithm , 2011 .

[12]  Nitesh V. Chawla,et al.  Editorial: special issue on learning from imbalanced data sets , 2004, SKDD.

[13]  Lotfi A. Zadeh,et al.  Fuzzy Sets , 1996, Inf. Control..

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Mahdi Mahfouf,et al.  A GA-Optimised Ensemble Neural Network Model For Charpy Impact Energy Predictions , 2010 .

[16]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[17]  James Theiler,et al.  Grafting: Fast, Incremental Feature Selection by Gradient Descent in Function Space , 2003, J. Mach. Learn. Res..

[18]  L. Zadeh A Fuzzy-Set-Theoretic Interpretation of Linguistic Hedges , 1972 .

[19]  Waldemar Karwowski,et al.  Identification of Key Variables Using Fuzzy Average With Fuzzy Cluster Distribution , 2007, IEEE Transactions on Fuzzy Systems.

[20]  Alan J. Miller Subset Selection in Regression , 1992 .

[21]  Michio Sugeno,et al.  Fuzzy identification of systems and its applications to modeling and control , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[22]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[23]  Gary M. Weiss Mining with rarity: a unifying framework , 2004, SKDD.

[24]  Tzung-Pei Hong,et al.  Trade-off Between Computation Time and Number of Rules for Fuzzy Mining from Quantitative Data , 2001, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[25]  Lotfi A. Zadeh,et al.  Outline of a New Approach to the Analysis of Complex Systems and Decision Processes , 1973, IEEE Trans. Syst. Man Cybern..

[26]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[27]  Dimitris Kanellopoulos,et al.  Handling imbalanced datasets: A review , 2006 .

[28]  A. Saah,et al.  Sensitivity and Specificity Reconsidered: The Meaning of These Terms in Analytical and Diagnostic Settings , 1997, Annals of Internal Medicine.

[29]  Mahdi Mahfouf,et al.  Mamdani-Type Fuzzy Modelling via Hierarchical Clustering and Multi-Objective Particle Swarm Optimisation (FM-HCPSO) , 2008 .

[30]  Dimitris Kanellopoulos,et al.  Data Preprocessing for Supervised Leaning , 2007 .

[31]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[32]  Derek A. Linkens,et al.  A systematic neuro-fuzzy modeling framework with application to material property prediction , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[33]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[34]  Robert Babu Input Selection for Nonlinear Regression Models , 2004 .

[35]  Hisao Ishibuchi,et al.  Interpretability Issues in Fuzzy Genetics-Based Machine Learning for Linguistic Modelling , 2003, Modelling with Words.

[36]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[37]  Ebrahim H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Hum. Comput. Stud..

[38]  Michael R Chernick,et al.  Bootstrap Methods: A Guide for Practitioners and Researchers , 2007 .

[39]  C. L. Philip Chen,et al.  Materials structure-property prediction using a self-architecting neural network , 1998 .

[40]  V. J. Rayward-Smith,et al.  Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition , 1999 .