Identification of critical inspection samples among railroad wheels by similarity-based agglomerative clustering

This paper proposes an unsupervised analysis methodfor identifying critical samples in large populations. The objective is to identify data features which help to pinpoint the critical samples that require the most inspection resources, namely time and money. Typically the data available for deriving the optimized inspection schedules in industry include both numeric and nominal features, and most clustering and classification algorithms are tailored for either numeric or nominal data. For this work, we adopt the Similarity-Based Agglomerative Clustering (SBAC) algorithm that has beenshown to be effective in clustering data with mixed numeric and nominal features. We present the effectiveness of this approach by applying it to an important problem in the railroadindustry, i.e., the inspection of railroad wheels.

[1]  Elena Kabo,et al.  An engineering model for prediction of rolling contact fatigue of railway wheels , 2002 .

[2]  Douglas H. Fisher,et al.  Knowledge Acquisition Via Incremental Conceptual Clustering , 1987, Machine Learning.

[3]  P. J. Mutton,et al.  Rolling contact fatigue in railway wheels under high axle loads , 1991 .

[4]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[5]  Vipin Kumar,et al.  Chameleon: Hierarchical Clustering Using Dynamic Modeling , 1999, Computer.

[6]  Sudipto Guha,et al.  ROCK: A Robust Clustering Algorithm for Categorical Attributes , 2000, Inf. Syst..

[7]  Zhang Yi,et al.  Clustering Categorical Data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[8]  Ke Wang,et al.  Clustering transactions using large items , 1999, CIKM '99.

[9]  Kevin Thompson,et al.  Cobweb/3: A portable implementation , 1990 .

[10]  He Zengyou,et al.  Squeezer: an efficient algorithm for clustering categorical data , 2002 .

[11]  Sankaran Mahadevan,et al.  Multiaxial fatigue reliability analysis of railroad wheels , 2008, Reliab. Eng. Syst. Saf..

[12]  Steven J. Fenves,et al.  The formation and use of abstract concepts in design , 1991 .

[13]  Aidong Zhang,et al.  WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases , 1998, VLDB.

[14]  Vipin Kumar,et al.  Clustering Based On Association Rule Hypergraphs , 1997, DMKD.

[15]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[16]  Yannis Manolopoulos,et al.  C2P: Clustering based on Closest Pairs , 2001, VLDB.

[17]  D. W. Goodall A New Similarity Index Based on Probability , 1966 .

[18]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[19]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[20]  Yoram Reich,et al.  Building and improving design systems: a machine learning approach , 1991 .

[21]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[22]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[23]  Joshua Zhexue Huang,et al.  A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining , 1997, DMKD.

[24]  Goutam Majumder,et al.  SHATTERED RIM WHEEL DEFECTS AND THE EFFECT OF LATERAL LOADS AND BRAKE HEATING ON THEIR GROWTH , 2002 .

[25]  H. O. Lancaster The combination of probabilities arising from data in discrete distributions. , 1949, Biometrika.

[26]  Sankaran Mahadevan,et al.  Structural Health Monitoring of Railroad Wheels Using Wheel Impact Load Detectors , 2007 .

[27]  Stephen José Hanson,et al.  Conceptual Clustering, Categorization, and Polymorphy , 1989, Machine Learning.

[28]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[29]  Peter C. Cheeseman,et al.  Bayesian Classification (AutoClass): Theory and Results , 1996, Advances in Knowledge Discovery and Data Mining.

[30]  Johannes Gehrke,et al.  CACTUS—clustering categorical data using summaries , 1999, KDD '99.

[31]  Yao Wang,et al.  A robust and scalable clustering algorithm for mixed type attributes in large database environment , 2001, KDD '01.

[32]  Sudipto Guha,et al.  CURE: an efficient clustering algorithm for large databases , 1998, SIGMOD '98.