Cluster ensemble framework based on the group method of data handling

Graphical abstractCE-GMDH contains the following three components: (1) initial solutions, (2) a transfer function (mechanism for the mutation of this organisation), and (3) an external criterion (selection mechanism). Three CE-GMDH models were proposed in our study according to different transfer functions. Display Omitted HighlightsA cluster ensemble framework based on the group method of data handling was proposed.The components of the CE-GMDH can be chosen according to the target of the application.Three novel transfer functions in CE-GMDH were proposed.CE-GMDH outperforms the other cluster ensemble algorithms and frameworks. Cluster ensemble is a powerful method for improving both the robustness and the stability of unsupervised classification solutions. This paper introduced group method of data handling (GMDH) to cluster ensemble, and proposed a new cluster ensemble framework, which named cluster ensemble framework based on the group method of data handling (CE-GMDH). CE-GMDH consists of three components: an initial solution, a transfer function and an external criterion. Several CE-GMDH models can be built according to different types of transfer functions and external criteria. In this study, three novel models were proposed based on different transfer functions: least squares approach, cluster-based similarity partitioning algorithm and semidefinite programming. The performance of CE-GMDH was compared among different transfer functions, and with some state-of-the-art cluster ensemble algorithms and cluster ensemble frameworks on synthetic and real datasets. Experimental results demonstrate that CE-GMDH can improve the performance of cluster ensemble algorithms which used as the transfer functions through its unique modelling process. It also indicates that CE-GMDH achieves a better or comparable result than the other cluster ensemble algorithms and cluster ensemble frameworks.

[1]  Ertunc Erdil,et al.  Combining multiple clusterings using similarity graph , 2011, Pattern Recognit..

[2]  Zengliang Gao,et al.  A novel unified correlation model using ensemble support vector regression for prediction of flooding velocity in randomly packed towers , 2014 .

[3]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Hamidah Ibrahim,et al.  A Survey: Clustering Ensembles Techniques , 2009 .

[5]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[6]  Xiaoyi Jiang,et al.  A dynamic classifier ensemble selection approach for noise data , 2010, Inf. Sci..

[7]  Alex Alves Freitas,et al.  A Survey of Evolutionary Algorithms for Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[8]  Frank Lemke,et al.  Self-Organizing Data Mining , 1998, Workshop Data Mining und Data Warehousing.

[9]  T. Poggio,et al.  Prediction of central nervous system embryonal tumour outcome based on gene expression , 2002, Nature.

[10]  Yong Chen,et al.  Ensemble Clustering for Internet Security Applications , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[11]  Vadlamani Ravi,et al.  Financial distress prediction in banks using Group Method of Data Handling neural network, counter propagation neural network and fuzzy ARTMAP , 2010, Knowl. Based Syst..

[12]  Kagan Tumer,et al.  Ensemble clustering with voting active clusters , 2008, Pattern Recognit. Lett..

[13]  Lyudmila V. Sarycheva Objective Cluster Analysis of Data Based on GMDH , 2008 .

[14]  Xiaoyi Jiang,et al.  Customer credit scoring based on HMM/GMDH hybrid model , 2012, Knowledge and Information Systems.

[15]  Hamid Parvin,et al.  A clustering ensemble framework based on selection of fuzzy weighted clusters in a locally adaptive clustering algorithm , 2013, Pattern Analysis and Applications.

[16]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[17]  Shu-hsien Liao,et al.  Mining customer knowledge for direct selling and marketing , 2011, Expert Syst. Appl..

[18]  John A. Hartigan,et al.  Clustering Algorithms , 1975 .

[19]  Hong Yu,et al.  A Cluster Ensemble Framework Based on Three-Way Decisions , 2013, RSKT.

[20]  Korris Fu-Lai Chung,et al.  Robust fuzzy clustering-based image segmentation , 2009, Appl. Soft Comput..

[21]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[22]  Sushmita Mitra,et al.  Clustering and its validation in a symbolic framework , 2003, Pattern Recognit. Lett..

[23]  L. Anastasakis,et al.  The Development of Self-Organization Techniques in Modelling: A Review of the Group Method of Data Handling (GMDH) , 2001 .

[24]  Jane You,et al.  SOM 2 CE: Double Self-Organizing Map Based Cluster Ensemble Framework and its Application in Cancer Gene Expression Profiles , 2012, IEA/AIE.

[25]  Tsaipei Wang,et al.  CA-Tree: A Hierarchical Structure for Efficient and Scalable Coassociation-Based Cluster Ensembles , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[26]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[27]  Seyed Mohammad Seyedhosseini,et al.  Cluster analysis using data mining approach to develop CRM methodology to assess the customer loyalty , 2010, Expert Syst. Appl..

[28]  El-Sayed M. El-Alfy,et al.  Using GMDH-based networks for improved spam detection and email feature analysis , 2011, Appl. Soft Comput..

[29]  Sandrine Dudoit,et al.  Bagging to Improve the Accuracy of A Clustering Procedure , 2003, Bioinform..

[30]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[31]  Xiaoyi Jiang,et al.  Weighted mean of a pair of clusterings , 2012, Pattern Analysis and Applications.

[32]  Jane You,et al.  Hybrid cluster ensemble framework based on the random combination of data transformation operators , 2012, Pattern Recognit..

[33]  Tossapon Boongoen,et al.  A Link-Based Cluster Ensemble Approach for Categorical Data Clustering , 2012, IEEE Transactions on Knowledge and Data Engineering.

[34]  Dan Gusfield,et al.  Partition-distance: A problem and class of perfect graphs arising in clustering , 2002, Inf. Process. Lett..

[35]  F. Lemke,et al.  Self-Organising Data Mining , 2003 .

[36]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[37]  Vipin Kumar,et al.  Partitioning-based clustering for Web document categorization , 1999, Decis. Support Syst..

[38]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[39]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[40]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Pietro Perona,et al.  Self-Tuning Spectral Clustering , 2004, NIPS.

[42]  Mark A. Gluck,et al.  Information, Uncertainty and the Utility of Categories , 1985 .

[43]  A. Ivakhnenko Heuristic self-organization in problems of engineering cybernetics , 1970 .

[44]  Ludmila I. Kuncheva,et al.  Using diversity in cluster ensembles , 2004, 2004 IEEE International Conference on Systems, Man and Cybernetics (IEEE Cat. No.04CH37583).

[45]  Pavlo D. Antonenko,et al.  Using cluster analysis for data mining in educational technology research , 2012, Educational Technology Research and Development.

[46]  G. W. Milligan,et al.  An examination of procedures for determining the number of clusters in a data set , 1985 .

[47]  Anil K. Jain,et al.  Clustering ensembles: models of consensus and weak partitions , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[49]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[50]  Darren J. Martin,et al.  Enhanced thermal stability of biomedical thermoplastic polyurethane with the addition of cellulose nanocrystals , 2015 .

[51]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[52]  Hui-lan Luo,et al.  Combining Multiple Clusterings using Information Theory based Genetic Algorithm , 2006, 2006 International Conference on Computational Intelligence and Security.

[53]  Girish N. Punj,et al.  Cluster Analysis in Marketing Research: Review and Suggestions for Application , 1983 .

[54]  Vikas Singh,et al.  Ensemble clustering using semidefinite programming with applications , 2010, Machine Learning.

[55]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[56]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[57]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[58]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Yi Liu,et al.  Real-time property prediction for an industrial rubber-mixing process with probabilistic ensemble Gaussian process regression models , 2015 .