Diagnosis of Rolling Bearing Based on Classification for High Dimensional Unbalanced Data

Motor systems are becoming more and more vital in modern manufacturing and bearings play an important role in the performance of a motor system. Many problems that arise in motor operation are related to bearing faults. In many cases, the accuracy of the devices for monitoring or controlling a motor system highly depends on the dynamic properties of motor bearings. Thus, fault diagnosis of a motor system is inseparably related to the diagnosis of the bearing assembly. The fault diagnosis of rolling bearings is substantially a classification problem. The traditional application of random forest (RF) to fault diagnosis methods is based on balanced data. However, in a practical situation, it is difficult to collect the fault data that are usually unbalanced. In order to solve this problem, in the first step, we propose a two-step (TS) clustering algorithm to enhance the original synthetic minority oversampling technique (SMOTE) algorithm for the unbalanced data classification. Then, based on the improvement of the SMOTE algorithm, we propose the principal component analysis (PCA) and apply it in the field of high-dimensional unbalanced fault diagnosis data. In this paper, we apply this new method to the fault diagnosis of rolling bearings, and the experiments conducted in the end show that the improved algorithm has a better classification performance.

[1]  Guigang Zhang,et al.  A fault diagnosis method of engine rotor based on Random Forests , 2016, 2016 IEEE International Conference on Prognostics and Health Management (ICPHM).

[2]  Bartosz Krawczyk,et al.  Learning from imbalanced data: open challenges and future directions , 2016, Progress in Artificial Intelligence.

[3]  V. Rodriguez-Galiano,et al.  Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines , 2015 .

[4]  Francesco Ferracuti,et al.  Statistical Spectral Analysis for Fault Diagnosis of Rotating Machines , 2018, IEEE Transactions on Industrial Electronics.

[5]  Tao Li,et al.  Cost-sensitive feature selection using random forest: Selecting low-cost subsets of informative features , 2016, Knowl. Based Syst..

[6]  Furong Gao,et al.  Review of Recent Research on Data-Based Process Monitoring , 2013 .

[7]  Yu Xue,et al.  Gene selection for tumor classification using neighborhood rough sets and entropy measures , 2017, J. Biomed. Informatics.

[8]  D. Zhang,et al.  Principle Component Analysis , 2004 .

[9]  Hidayet Takçi,et al.  Using authorship analysis techniques in forensic analysis of electronic mails , 2012, 2012 20th Signal Processing and Communications Applications Conference (SIU).

[10]  Mikhail Belkin,et al.  Semi-Supervised Learning on Riemannian Manifolds , 2004, Machine Learning.

[11]  Weizhong Yan,et al.  Application of Random Forest to Aircraft Engine Fault Diagnosis , 2006, The Proceedings of the Multiconference on "Computational Engineering in Systems Applications".

[12]  Thomas W. Rauber,et al.  Heterogeneous Feature Models and Feature Selection Applied to Bearing Fault Diagnosis , 2015, IEEE Transactions on Industrial Electronics.

[13]  Taghi M. Khoshgoftaar,et al.  An Empirical Study of Learning from Imbalanced Data Using Random Forest , 2007 .

[14]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[15]  Xue-wen Chen,et al.  FAST: a roc-based feature selection metric for small samples and imbalanced data classification problems , 2008, KDD.

[16]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[17]  Miao He,et al.  Deep Learning Based Approach for Bearing Fault Diagnosis , 2017, IEEE Transactions on Industry Applications.

[18]  Yu Xue,et al.  A hybrid feature selection algorithm for gene expression data classification , 2017, Neurocomputing.

[19]  Ma Li,et al.  CURE-SMOTE algorithm and hybrid algorithm for feature selection and parameter optimization based on random forests , 2017, BMC Bioinformatics.

[20]  Aapo Hyvärinen,et al.  Learning Visual Spatial Pooling by Strong PCA Dimension Reduction , 2016, Neural Computation.

[21]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[22]  Siyang Wang,et al.  A new Centroid-Based Classification model for text categorization , 2017, Knowl. Based Syst..

[23]  W.-Y. Chen,et al.  A study on automatic machine condition monitoring and fault diagnosis for bearing and unbalanced rotor faults , 2011, 2011 IEEE International Symposium on Industrial Electronics.

[24]  F. Peña,et al.  Two-step cluster procedure after principal component analysis identifies sperm subpopulations in canine ejaculates and its relation to cryoresistance. , 2006, Journal of andrology.

[25]  Xiaojin Zhu,et al.  Introduction to Semi-Supervised Learning , 2009, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[26]  Ashfaqur Rahman,et al.  Cluster-Oriented Ensemble Classifier: Impact of Multicluster Characterization on Ensemble Classifier Learning , 2012, IEEE Transactions on Knowledge and Data Engineering.

[27]  David A. Landgrebe,et al.  Supervised classification in high-dimensional space: geometrical, statistical, and asymptotical properties of multivariate data , 1998, IEEE Trans. Syst. Man Cybern. Part C.

[28]  Amera Almas,et al.  Enhancing the performance of decision tree: A research study of dealing with unbalanced data , 2012, Seventh International Conference on Digital Information Management (ICDIM 2012).

[29]  Jing Yuan,et al.  Wavelet transform based on inner product in fault diagnosis of rotating machinery: A review , 2016 .

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Mohammad Saniee Abadeh,et al.  Fuzzy classifcation of imbalanced data sets for medical diagnosis , 2010, 2010 17th Iranian Conference of Biomedical Engineering (ICBME).

[32]  Curt DeGroff,et al.  A classifier based on the artificial neural network approach for cardiologic auscultation in pediatrics , 2005, Artif. Intell. Medicine.

[33]  Terrence J. Sejnowski,et al.  Unsupervised Learning , 2018, Encyclopedia of GIS.

[34]  O S Sitompul,et al.  PCA based feature reduction to improve the accuracy of decision tree c4.5 classification , 2018 .

[35]  Jun-Hai Zhai,et al.  The classification of imbalanced large data sets based on MapReduce and ensemble of ELM classifiers , 2015, International Journal of Machine Learning and Cybernetics.

[36]  Rok Blagus,et al.  SMOTE for high-dimensional class-imbalanced data , 2013, BMC Bioinformatics.

[37]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[38]  Zhiqian Bo,et al.  Fault Detection and Classification in EHV Transmission Line Based on Wavelet Singular Entropy , 2010, IEEE Transactions on Power Delivery.

[39]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[40]  Yang Wang,et al.  Cost-sensitive boosting for classification of imbalanced data , 2007, Pattern Recognit..

[41]  Andrew K. C. Wong,et al.  Classification of Imbalanced Data: a Review , 2009, Int. J. Pattern Recognit. Artif. Intell..

[42]  Ping Zhang,et al.  A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process , 2012 .

[43]  Shikha Gupta,et al.  Identifying pollution sources and predicting urban air quality using ensemble learning methods , 2013 .

[44]  Alain Rakotomamonjy,et al.  Variable Selection Using SVM-based Criteria , 2003, J. Mach. Learn. Res..

[45]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[46]  Baba C. Vemuri,et al.  Covariant Image Representation with Applications to Classification Problems in Medical Imaging , 2015, International Journal of Computer Vision.

[47]  Rahul Shrivastava,et al.  Application and Evaluation of Random Forest Classifier Technique for Fault Detection in Bioreactor Operation , 2017 .

[48]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[49]  Wei Tang,et al.  Selective Ensemble of Decision Trees , 2003, RSFDGrC.

[50]  Maher Maalouf,et al.  Logistic regression in data analysis: an overview , 2011, Int. J. Data Anal. Tech. Strateg..

[51]  V. Purushotham,et al.  Multi-fault diagnosis of rolling bearing elements using wavelet analysis and hidden Markov model based fault recognition , 2005 .

[52]  Qinghua Zhang,et al.  Fault Diagnosis of a Rolling Bearing Using Wavelet Packet Denoising and Random Forests , 2017, IEEE Sensors Journal.

[53]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[54]  Carolin Strobl,et al.  An AUC-based permutation variable importance measure for random forests , 2013, BMC Bioinformatics.

[55]  E. Dobriban,et al.  Sharp detection in PCA under correlations: all eigenvalues matter , 2016, 1602.06896.

[56]  Erdal Kiliç,et al.  Feature selection in text classification , 2016, 2016 24th Signal Processing and Communication Application Conference (SIU).

[57]  T. Jaya Lakshmi,et al.  A study on classifying imbalanced datasets , 2014, 2014 First International Conference on Networks & Soft Computing (ICNSC2014).

[58]  Paul A. Viola,et al.  Boosting Image Retrieval , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[59]  Violeta Andjelkovic,et al.  THE IDENTIFICATION OF DROUGHT TOLERANT MAIZE ACCESSIONS BY TWO-STEP CLUSTER ANALYSIS , 2012 .

[60]  B. Moore Principal component analysis in linear systems: Controllability, observability, and model reduction , 1981 .