A fuzzy c-means algorithm based on the relationship among attributes of data and its application in tunnel boring machine

Abstract In recent years, a number of operation data from engineering systems have been measured and recorded, which promotes the development of engineering data mining. However, the operating state of the engineering system usually changes greatly, which results that the patterns of operation data vary considerably as well. Thus, partitioning these data can provide useful references to the design and analysis of engineering systems. In this paper, a new clustering algorithm based on support vector regression and fuzzy c-means algorithm (SVR–FCM) is proposed to accomplish this work. The SVR–FCM algorithm is based on the framework of fuzzy c-means algorithm (FCM), in which the differences between the clusters are evaluated by the relationship among attributes of data. In the proposed algorithm, support vector regression (SVR) is utilized to describe the relationship among attributes of, and an alteration optimization method is designed to optimize the new designed clustering objective function. A series of experiments on synthetic datasets and real-world datasets are conducted to evaluate the performance of the SVR–FCM algorithm, which shows the higher effectiveness and advances of the SVR–FCM algorithm compared with other popular clustering algorithms. The SVR–FCM algorithm is applied to a tunnel boring machine (TBM) operation dataset collected from a real TBM project in China. The experimental results show that the proposed algorithm performs well in TBM operation data clustering. This paper also highlights the applicability and potential of data clustering in the analysis of other complex engineering systems similar to TBMs.

[1]  LiJunjie,et al.  Risk analysis of dam based on artificial bee colony algorithm with fuzzy c-means clustering , 2011 .

[2]  Jamal Rostami,et al.  Performance prediction of hard rock Tunnel Boring Machines (TBMs) in difficult ground , 2016 .

[3]  T. Simpson,et al.  Analysis of support vector regression for approximation of complex engineering analyses , 2005, DAC 2003.

[4]  Frank Klawonn,et al.  Fuzzy clustering with polynomial fuzzifier function in connection with m-estimators , 2011 .

[5]  Betul Bektas Ekici,et al.  A least squares support vector machine model for prediction of the next day solar insolation for effective use of PV systems , 2014 .

[6]  Luca Scrucca,et al.  mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models , 2016, R J..

[7]  Xueguan Song,et al.  A fuzzy c-means algorithm guided by attribute correlations and its application in the big data analysis of tunnel boring machine , 2019, Knowl. Based Syst..

[8]  Duc Truong Pham,et al.  Clustering techniques and their applications in engineering , 2007 .

[9]  James C. Bezdek,et al.  Optimization of clustering criteria by reformulation , 1995, IEEE Trans. Fuzzy Syst..

[10]  Jianrong Tan,et al.  Prediction of geological conditions for a tunnel boring machine using big operational data , 2019, Automation in Construction.

[11]  E. Ostertagová Modelling using Polynomial Regression , 2012 .

[12]  B. Agard,et al.  Data-mining-based methodology for the design of product families , 2004 .

[13]  Maria Grazia De Giorgi,et al.  Comparison Between Wind Power Prediction Models Based on Wavelet Decomposition with Least-Squares Support Vector Machine (LS-SVM) and Artificial Neural Network (ANN) , 2014 .

[14]  Hadi Sadoghi Yazdi,et al.  IRAHC: Instance Reduction Algorithm using Hyperrectangle Clustering , 2015, Pattern Recognit..

[15]  Nenad Grujovic,et al.  Development of support vector regression identification model for prediction of dam structural behaviour , 2014 .

[16]  Miin-Shen Yang,et al.  A cluster validity index for fuzzy clustering , 2005, Pattern Recognit. Lett..

[17]  J. Bezdek Cluster Validity with Fuzzy Sets , 1973 .

[18]  Anil K. Jain Data clustering: 50 years beyond K-means , 2010, Pattern Recognit. Lett..

[19]  Sukadev Meher,et al.  Detection of Moving Objects Using Fuzzy Color Difference Histogram Based Background Subtraction , 2016, IEEE Signal Processing Letters.

[20]  Javad Hamidzadeh,et al.  IRDDS: Instance reduction based on Distance-based decision surface , 2015 .

[21]  Junjie Li,et al.  Artificial Bee Colony Algorithm Optimized Support Vector Regression for System Reliability Analysis of Slopes , 2016, J. Comput. Civ. Eng..

[22]  Jiaxin Yuan,et al.  Recognition of High-Voltage Cable Partial Discharge Signal Based on Adaptive Fuzzy C-Means Clustering , 2017, Int. J. Pattern Recognit. Artif. Intell..

[23]  Jui-Sheng Chou,et al.  Peak Shear Strength of Discrete Fiber-Reinforced Soils Computed by Machine Learning and Metaensemble Methods , 2016, J. Comput. Civ. Eng..

[24]  Noureddine Zerhouni,et al.  Health assessment and life prediction of cutting tools based on support vector regression , 2015, J. Intell. Manuf..

[25]  Amin Bemani,et al.  Application of Fuzzy c-means algorithm for the estimation of Asphaltene precipitation , 2018 .

[26]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[27]  Mohamed A. Meguid,et al.  Physical modeling of tunnels in soft ground: A review , 2008 .

[28]  Shaoping Wang,et al.  Fault diagnosis of hydraulic piston pumps based on a two-step EMD method and fuzzy C-means clustering , 2016 .

[29]  Saeid Nahavandi,et al.  An expert system for selecting wart treatment method , 2017, Comput. Biol. Medicine.

[30]  Daniel F. Leite,et al.  Fuzzy clustering and fuzzy validity measures for knowledge discovery and decision making in agricultural engineering , 2018, Comput. Electron. Agric..

[31]  Seref Sagiroglu,et al.  The development of intuitive knowledge classifier and the modeling of domain dependent data , 2013, Knowl. Based Syst..

[32]  Wei Sun,et al.  Surrogate-Based Multisource Sensitivity Analysis of TBM Driving System , 2018 .

[33]  George K. Karagiannidis,et al.  Efficient Machine Learning for Big Data: A Review , 2015, Big Data Res..

[34]  J Jiang,et al.  Medical image analysis with artificial neural networks , 2010, Comput. Medical Imaging Graph..

[35]  Witold Pedrycz,et al.  Fuzzy clustering of time series data using dynamic time warping distance , 2015, Eng. Appl. Artif. Intell..

[36]  Chao Zhang,et al.  Recurrent neural networks for real-time prediction of TBM operating parameters , 2019, Automation in Construction.

[37]  T. B. Murphy,et al.  Gaussian Parsimonious Clustering Models with Covariates , 2017 .

[38]  Christian Hennig,et al.  Recovering the number of clusters in data sets with noise features using feature rescaling factors , 2015, Inf. Sci..

[39]  James C. Bezdek,et al.  Local convergence of tri-level alternating optimization , 2001, Neural Parallel Sci. Comput..

[40]  J. Bezdek Numerical taxonomy with fuzzy sets , 1974 .

[41]  Javad Hamidzadeh,et al.  New Hermite orthogonal polynomial kernel and combined kernels in Support Vector Machine classifier , 2016, Pattern Recognit..

[42]  Derek Greene,et al.  Normalized Mutual Information to evaluate overlapping community finding algorithms , 2011, ArXiv.

[43]  Ranjan Maitra,et al.  A k-mean-directions Algorithm for Fast Clustering of Data on the Sphere , 2010 .