Fault Diagnosis Using Clustering. What Statistical Test to use for Hypothesis Testing?

Predictive maintenance and condition-based monitoring systems have seen significant prominence in recent years to minimize the impact of machine downtime on production and its costs. Predictive maintenance involves using concepts of data mining, statistics, and machine learning to build models that are capable of performing early fault detection, diagnosing the faults and predicting the time to failure. Fault diagnosis has been one of the core areas where the actual failure mode of the machine is identified. In fluctuating environments such as manufacturing, clustering techniques have proved to be more reliable compared to supervised learning methods. One of the fundamental challenges of clustering is developing a test hypothesis and choosing an appropriate statistical test for hypothesis testing. Most statistical analyses use some underlying assumptions of the data which most real-world data is incapable of satisfying those assumptions. This paper is dedicated to overcoming the following challenge by developing a test hypothesis for fault diagnosis application using clustering technique and performing PERMANOVA test for hypothesis testing.

[1]  Andreas Gössling,et al.  Architecture of a Predictive Maintenance Framework , 2007, 6th International Conference on Computer Information Systems and Industrial Management Applications (CISIM'07).

[2]  Frédéric Jurie,et al.  Randomized Clustering Forests for Image Classification , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  M. Bartlett TESTS OF SIGNIFICANCE IN FACTOR ANALYSIS , 1950 .

[4]  G. Glass,et al.  Consequences of Failure to Meet Assumptions Underlying the Fixed Effects Analyses of Variance and Covariance , 1972 .

[5]  Hui Xiong,et al.  Understanding of Internal Clustering Validation Measures , 2010, 2010 IEEE International Conference on Data Mining.

[6]  Joseph E. Beck,et al.  High-Level Student Modeling with Machine Learning , 2000, Intelligent Tutoring Systems.

[7]  Heng Tao Shen,et al.  Principal Component Analysis , 2009, Encyclopedia of Biometrics.

[8]  Jeremy M. Lyle,et al.  Application of elliptical Fourier analysis of otolith form as a tool for stock identification , 2006 .

[9]  Eréndira Rendón,et al.  Internal versus External cluster validation indexes , 2011 .

[10]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[11]  Max Kuhn,et al.  Building Predictive Models in R Using the caret Package , 2008 .

[12]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .

[13]  Michel Verleysen,et al.  Fully nonparametric probability density function estimation with finite Gaussian mixture models , 2003 .

[14]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[15]  Tarun Gupta,et al.  Modified Rank Order Clustering Algorithm Approach by Including Manufacturing Data , 2016 .

[16]  K. Svenson,et al.  Diet dominates host genotype in shaping the murine gut microbiota. , 2015, Cell host & microbe.

[17]  Pavel Berkhin,et al.  A Survey of Clustering Data Mining Techniques , 2006, Grouping Multidimensional Data.

[18]  Tarun Gupta,et al.  A research study on unsupervised machine learning algorithms for early fault detection in predictive maintenance , 2018, 2018 5th International Conference on Industrial Engineering and Applications (ICIEA).

[19]  S. Shapiro,et al.  An Analysis of Variance Test for Normality (Complete Samples) , 1965 .

[20]  Adrian E. Raftery,et al.  Model-Based Clustering, Discriminant Analysis, and Density Estimation , 2002 .

[21]  Konstantinos C. Gryllias,et al.  Rolling element bearing fault detection in industrial environments based on a K-means clustering approach , 2011, Expert Syst. Appl..

[22]  Nagdev Amruthnath Data Security in wireless Sensor Network using Multipath Randomized Dispersive Routes , 2014 .

[23]  Maurice K. Wong,et al.  Algorithm AS136: A k-means clustering algorithm. , 1979 .

[24]  H. Akaike Factor analysis and AIC , 1987 .

[25]  Karl W. Broman,et al.  A model selection approach for the identification of quantitative trait loci in experimental crosses , 2002 .

[26]  Virginia R. de Sa,et al.  Learning Classification with Unlabeled Data , 1993, NIPS.

[27]  M. Wedel,et al.  Market Segmentation: Conceptual and Methodological Foundations , 1997 .

[28]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[29]  Miguel Ángel Sanz Bobi,et al.  SIMAP: Intelligent System for Predictive Maintenance: Application to the health condition monitoring of a windturbine gearbox , 2006 .

[30]  Zahir Tari,et al.  A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis , 2014, IEEE Transactions on Emerging Topics in Computing.

[31]  Etienne Parizet,et al.  Analysis of car door closing sound quality , 2008 .

[32]  R. Keith Mobley,et al.  An introduction to predictive maintenance , 1989 .