Integrating synthetic minority oversampling and gradient boosting decision tree for bogie fault diagnosis in rail vehicles

Bogies are critical components of a rail vehicle, which are important for the safe operation of rail transit. In this study, the authors analyzed the real vibration data of the bogies of a railway vehicle obtained from a Chinese subway company under four different operating conditions. The authors selected 15 feature indexes – that ranged from time-domain, energy, and entropy – as well as their correlations. The adaptive synthetic sampling approach–gradient boosting decision tree (ADASYN–GBDT) method is proposed for the bogie fault diagnosis. A comparison between ADASYN–GBDT and the three commonly used classifiers (K-nearest neighbor, support vector machine, and Gaussian naïve Bayes), combined with random forest as the feature selection, was done under different test data sizes. A confusion matrix was used to evaluate those classifiers. In K-nearest neighbor, support vector machine, and Gaussian naïve Bayes, the optimal features should be selected first, while the proposed method of this study does not need to select the optimal features. K-nearest neighbor, support vector machine, and Gaussian naïve Bayes produced inaccurate results in multi-class identification. It can be seen that the lowest false detection rates of the proposed ADASYN–GBDT model are 92.95% and 87.81% when proportion of the test dataset is 0.4 and 0.9, respectively. In addition, the ADASYN–GBDT model has the ability to correctly identify a fault, which makes it more practical and suitable for use in railway operations. The entire process (training and testing) was finished in 2.4231 s and the detection procedure took 0.0027 s on average. The results show that the proposed ADASYN–GBDT method satisfied the requirements of real-time performance and accuracy for online fault detection. It might therefore aid in the fault detection of bogies.

[1]  Raimundo Delgado,et al.  Finite-element model calibration of a railway vehicle based on experimental modal parameters , 2013 .

[2]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  Ai Yanting,et al.  Fusion information entropy method of rolling bearing fault diagnosis based on n-dimensional characteristic parameter distance , 2017 .

[4]  Paolo Pennacchi,et al.  The relationship between kurtosis- and envelope-based indexes for the diagnostic of rolling element bearings , 2014 .

[5]  Yong Qin,et al.  Fault detection of rolling bearing based on FFT and classification , 2015 .

[6]  Bohyung Han,et al.  Tracking-by-Segmentation with Online Gradient Boosting Decision Tree , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[7]  Jing Wen,et al.  Text Categorization System for Stock Prediction , 2015 .

[8]  Dewang Chen,et al.  Research on Fault Detection of High-Speed Train Bogie , 2017 .

[9]  Xuelong Li,et al.  Learning k for kNN Classification , 2017, ACM Trans. Intell. Syst. Technol..

[10]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[11]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[12]  Stephen V. Stehman,et al.  Selecting and interpreting measures of thematic classification accuracy , 1997 .

[13]  Jianjun Xie,et al.  Prediction of transfers to tertiary care and hospital mortality: A gradient boosting decision tree approach , 2010, Stat. Anal. Data Min..

[14]  Bo Tang,et al.  Intelligent Fault Diagnosis of the High-Speed Train With Big Data Based on Deep Neural Networks , 2017, IEEE Transactions on Industrial Informatics.

[15]  Chuan Ding,et al.  Predicting Short-Term Subway Ridership and Prioritizing Its Influential Factors Using Gradient Boosting Decision Trees , 2016 .

[16]  Tao Yang,et al.  Automated classification of neonatal amplitude-integrated EEG based on gradient boosting method , 2016, Biomed. Signal Process. Control..

[17]  Tielin Shi,et al.  Nondestructive diagnosis of flip chips based on vibration analysis using PCA-RBF , 2017 .

[18]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[19]  N Qin Ensemble empirical mode decomposition and fuzzy entropy in fault feature analysis for high-speed train bogie , 2014 .

[20]  Jiye Liang,et al.  An efficient instance selection algorithm for k nearest neighbor regression , 2017, Neurocomputing.

[21]  Nilesh Y. Choudhary,et al.  Predicting Instructor Performance using Naïve Bayes Classification Algorithm in Data Mining Technique , 2018 .

[22]  James V. Rauff Data Mining: A Tutorial-Based Primer , 2005 .

[23]  Halil Yigit,et al.  A weighting approach for KNN classifier , 2013, 2013 International Conference on Electronics, Computer and Computation (ICECCO).

[24]  Mir Mohammad Ettefagh,et al.  Diagnosis of combined faults in Rotary Machinery by Non-Naive Bayesian approach , 2017 .

[25]  Parham Shahidi,et al.  Railcar Bogie Performance Monitoring using Mutual Information and Support Vector Machines , 2015 .

[26]  Chun-Xia Zhang,et al.  An Empirical Study on the Performance of Cost-Sensitive Boosting Algorithms with Different Levels of Class Imbalance , 2013 .

[27]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[28]  Zhi-Hua Zhou,et al.  The Influence of Class Imbalance on Cost-Sensitive Learning: An Empirical Study , 2006, Sixth International Conference on Data Mining (ICDM'06).

[29]  Nathalie Japkowicz,et al.  The class imbalance problem: A systematic study , 2002, Intell. Data Anal..

[30]  Bin Gu,et al.  Cross Validation Through Two-Dimensional Solution Surface for Cost-Sensitive SVM , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[32]  Yan-Fu Li,et al.  A SVM framework for fault detection of the braking system in a high speed train , 2017, Mechanical Systems and Signal Processing.

[33]  Anna Barbati,et al.  Stochastic gradient boosting classification trees for forest fuel types mapping through airborne laser scanning and IRS LISS-III imagery , 2013, Int. J. Appl. Earth Obs. Geoinformation.