Hybrid rebalancing approach to handle imbalanced dataset for fault diagnosis in manufacturing systems

In a mature manufacturing system, the occurrence of operating fault conditions is few and far between. Majority of the data collected from such systems typically exhibits normal operating behaviours. This phenomenon inadvertently creates an imbalance between the class distributions of the data. The imbalance ratio may fall in the range of 1:100 to 1:1000 for every fault condition data available. The nature of such datasets thus makes it harder to build reliable models for accurate fault diagnosis in Condition-Based Maintenance (CBM) due to the lack of learning exemplars of the fault class. Conventional machine learning algorithms do not handle imbalanced datasets well and generally would produce poor classification results. To improve the fault diagnosis reliability on class-imbalanced datasets, this paper proposes a hybrid rebalancing approach called Hybrid Support Vector Machine (SVM) under sampling with Mega Trend Diffusion (MTD) oversampling. Our proposed approach rebalances the dataset by (1) Reducing the amount of normal condition data whilst retaining the most informative ones and (2) Boosting the number of fault condition data to match the size of the normal data. This approach is highly applicable to the manufacturing setting as there is a level of predictability to the nature of data, i.e. data of different fault conditions tend to cluster together in the feature space. Thus, manipulating the data at this level is a logical step. As such, learning effectively with the limited available fault data can translate to significantly cost-saving. Our approach is demonstrated and validated with a case study on bearing fault detection. To end, some conclusions and future works are discussed.

[1]  Junjie Wu,et al.  Classification with Class Overlapping: A Systematic Study , 2010 .

[2]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[3]  Foster Provost,et al.  Machine Learning from Imbalanced Data Sets 101 , 2008 .

[4]  Frank L. Lewis,et al.  Dominant Feature Identification for Industrial Fault Detection and Isolation Applications , 2011, Expert Syst. Appl..

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Clarence W. de Silva,et al.  Vibration: Fundamentals and Practice, Second Edition , 2006 .

[7]  M. P. Norton,et al.  Fundamentals of Noise and Vibration Analysis for Engineers, 2nd Edition , 2007 .

[8]  Der-Chiang Li,et al.  Using mega-trend-diffusion and artificial samples in small data set learning for early flexible manufacturing system scheduling knowledge , 2007, Comput. Oper. Res..

[9]  Bo-Suk Yang,et al.  Intelligent fault diagnosis system of induction motor based on transient current signal , 2009 .

[10]  T. I. Liu,et al.  Intelligent monitoring of tapping tools , 1990 .

[11]  LiDer-Chiang,et al.  A learning method for the class imbalance problem with medical data sets , 2010 .

[12]  Yanqing Zhang,et al.  SVMs Modeling for Highly Imbalanced Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  D. C. Hodgson,et al.  Book Review : Fundamentals of Noise and Vibration Analysis for Engineers: M.P. Norton Cambridge University Press Cambridge, UK 1989, 619 pp, $95 (hard cover) $37.50 (paperback) , 1990 .