Factor Analysis in Fault Diagnostics Using Random Forest

Factor analysis or sometimes referred to as variable analysis has been extensively used in classification problems for identifying specific factors that are significant to particular classes. This type of analysis has been widely used in application such as customer segmentation, medical research, network traffic, image, and video classification. Today, factor analysis is prominently being used in fault diagnosis of machines to identify the significant factors and to study the root cause of a specific machine fault. The advantage of performing factor analysis in machine maintenance is to perform prescriptive analysis (helps answer what actions to take?) and preemptive analysis (helps answer how to eliminate the failure mode?). In this paper, a real case of an industrial rotating machine was considered where vibration and ambient temperature data was collected for monitoring the health of the machine. Gaussian mixture model-based clustering was used to cluster the data into significant groups, and spectrum analysis was used to diagnose each cluster to a specific state of the machine. The significant features that attribute to a particular mode of the machine were identified by using the random forest classification model. The significant features for specific modes of the machine were used to conclude that the clusters generated are distinct and have a unique set of significant features.

[1]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[2]  David R. Anderson,et al.  Multimodel Inference , 2004 .

[3]  Bo-Suk Yang,et al.  Fault diagnosis of induction motor based on decision trees and adaptive neuro-fuzzy inference , 2009, Expert Syst. Appl..

[4]  Hashem M. Hashemian,et al.  State-of-the-Art Predictive Maintenance Techniques* , 2011, IEEE Transactions on Instrumentation and Measurement.

[5]  F. Floyd,et al.  Factor analysis in the development and refinement of clinical assessment instruments. , 1995 .

[6]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[7]  Manish Kumar,et al.  Forecasting Stock Index Movement: A Comparison of Support Vector Machines and Random Forest , 2006 .

[8]  D.E. Schump,et al.  Testing to assure reliable operation of electric motors , 1990, Conference Record of the 1990 IEEE Industry Applications Society Annual Meeting.

[9]  S. H. Upadhyay,et al.  Fault diagnosis of rolling element bearing by using multinomial logistic regression and wavelet packet transform , 2013, Soft Computing.

[10]  Suilou Huang,et al.  Testing and optimizing two factor-analysis techniques on aerosol at Narragansett, Rhode Island , 1999 .

[11]  Kuo-Chung Lin,et al.  Wavelet packet feature extraction for vibration monitoring , 1999, IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339).

[12]  Jin Hyun Park,et al.  Fault detection and identification of nonlinear processes based on kernel PCA , 2005 .

[13]  David Cella,et al.  Factor analysis techniques for assessing sufficient unidimensionality of cancer related fatigue , 2006, Quality of Life Research.

[14]  F.O. Heimes,et al.  Recurrent neural networks for remaining useful life estimation , 2008, 2008 International Conference on Prognostics and Health Management.

[15]  Timo Sorsa,et al.  Neural networks in process fault diagnosis , 1991, IEEE Trans. Syst. Man Cybern..

[16]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[17]  G. Panda,et al.  Fault Classification and Section Identification of an Advanced Series-Compensated Transmission Line Using Support Vector Machine , 2007, IEEE Transactions on Power Delivery.

[18]  L. Swanson Linking maintenance strategies to performance , 2001 .

[19]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[20]  Tarun Gupta,et al.  Modified Rank Order Clustering Algorithm Approach by Including Manufacturing Data , 2016 .

[21]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[22]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[23]  D. R. Thomas,et al.  On Variable Importance in Linear Regression , 1998 .

[24]  Stephan W. Wegerich,et al.  Nonparametric modeling of vibration signal features for equipment health monitoring , 2003, 2003 IEEE Aerospace Conference Proceedings (Cat. No.03TH8652).

[25]  John T. Renwick,et al.  Vibration Analysis---A Proven Technique as a Predictive Maintenance Tool , 1985, IEEE Transactions on Industry Applications.

[26]  U. Grömping Dependence of Variable Importance in Random Forests on the Shape of the Regressor Space , 2009 .

[27]  W. R. Finley,et al.  An analytical approach to solving motor vibration problems , 1999, Industry Applications Society 46th Annual Petroleum and Chemical Technical Conference (Cat.No. 99CH37000).

[28]  Noureddine Zerhouni,et al.  Remaining useful life estimation based on nonlinear feature reduction and support vector regression , 2013, Eng. Appl. Artif. Intell..

[29]  Nagdev Amruthnath Data Security in wireless Sensor Network using Multipath Randomized Dispersive Routes , 2014 .

[30]  Eric W. T. Ngai,et al.  Customer churn prediction using improved balanced random forests , 2009, Expert Syst. Appl..

[31]  Asoke K. Nandi,et al.  FAULT DETECTION USING SUPPORT VECTOR MACHINES AND ARTIFICIAL NEURAL NETWORKS, AUGMENTED BY GENETIC ALGORITHMS , 2002 .

[32]  Michel Verleysen,et al.  Fully nonparametric probability density function estimation with finite Gaussian mixture models , 2003 .

[33]  Harish Garg,et al.  A TWO-PHASE APPROACH FOR RELIABILITY AND MAINTAINABILITY ANALYSIS OF AN INDUSTRIAL SYSTEM , 2012 .

[34]  Tarun Gupta,et al.  Fault class prediction in unsupervised learning using model-based clustering approach , 2018, 2018 International Conference on Information and Computer Technologies (ICICT).

[35]  Jie Chen,et al.  Fault diagnosis in nonlinear dynamic systems via neural networks , 1994 .

[36]  Kevin J. Johnson,et al.  Pattern recognition of jet fuels: comprehensive GC×GC with ANOVA-based feature selection and principal component analysis , 2002 .

[37]  W. Revelle psych: Procedures for Personality and Psychological Research , 2017 .

[38]  Ramón Díaz-Uriarte,et al.  Gene selection and classification of microarray data using random forest , 2006, BMC Bioinformatics.

[39]  Adrian E. Raftery,et al.  mclust Version 4 for R : Normal Mixture Modeling for Model-Based Clustering , Classification , and Density Estimation , 2012 .

[40]  Brett J. Butler,et al.  Understanding and Reaching Family Forest Owners: Lessons from Social Marketing Research , 2007, Journal of Forestry.

[41]  Erzsébet Merényi,et al.  A Validity Index for Prototype-Based Clustering of Data Sets With Complex Cluster Structures , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Harish Garg,et al.  Bi-Criteria Optimization for Finding the Optimal Replacement Interval for Maintaining the Performance of the Process Industries , 2016 .

[43]  M. R. Muller,et al.  Motor maintenance: A survey of techniques and results , 1997 .

[44]  Tarun Gupta,et al.  A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance , 2018, 2018 5th International Conference on Industrial Engineering and Applications (ICIEA).

[45]  Jin Wang,et al.  Fault Detection Using the k-Nearest Neighbor Rule for Semiconductor Manufacturing Processes , 2007, IEEE Transactions on Semiconductor Manufacturing.