Decoy Meta–Clustering Through Rough Graded Possibilistic C-Medoids

Current ab initio methods for structure–prediction of proteins explore multiple simulated conformations, called decoys, to generate families of folds, one of which is the closest to the native one. To limit the exploration of the conformational space, clustering algorithms are routinely applied to group similar decoys and then finding the most plausible cluster centroid, based on the hypothesis that there are more low–energy conformations surrounding the native fold than the others; nevertheless different clustering algorithms, or different parameters, are likely to output different partitions of the input data and choosing only one of the possible solutions can be too restrictive and unreliable. meta–clustering algorithms allow to reconcile multiple clustering solutions by grouping them into meta-clusters (i.e. clusters of clusterings), so that similar partitions are grouped in the same meta–cluster. In this paper the use of meta–clustering is proposed for the selection of lowest energy decoys, testing the Rough Graded Possibilistic c-medoids clustering algorithm for both baseline clustering and meta–clustering. Preliminary tests on real data suggest that meta–clustering is effective in reducing the sensitivity to parameters of the clustering algorithm and to expand the explored space.

[1]  D. Baker,et al.  Clustering of low-energy conformations near the native structures of small proteins. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[2]  W. Kabsch A solution for the best rotation to relate two sets of vectors , 1976 .

[3]  Silke Wagner,et al.  Comparing Clusterings - An Overview , 2007 .

[4]  Alessio Ferone,et al.  Decoy clustering through graded possibilistic c-medoids , 2017, 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE).

[5]  J. Dunn Well-Separated Clusters and Optimal Fuzzy Partitions , 1974 .

[6]  James C. Bezdek,et al.  Relational duals of the c-means clustering algorithms , 1989, Pattern Recognit..

[7]  Alessio Ferone,et al.  Integrating rough set principles in the graded possibilistic clustering , 2019, Inf. Sci..

[8]  Alan Agresti,et al.  The Measurement of Classification Agreement: An Adjustment to the Rand Statistic for Chance Agreement , 1984 .

[9]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Rich Caruana,et al.  Meta Clustering , 2006, Sixth International Conference on Data Mining (ICDM'06).

[11]  Jeffrey Skolnick,et al.  All-atom ab initio folding of a diverse set of proteins. , 2006, Structure.

[12]  M. Cugmas,et al.  On comparing partitions , 2015 .

[13]  Jens Meiler,et al.  ROSETTA3: an object-oriented software suite for the simulation and design of macromolecules. , 2011, Methods in enzymology.

[14]  C. Mallows,et al.  A Method for Comparing Two Hierarchical Clusterings , 1983 .

[15]  Michele Ceccarelli,et al.  Assessing Clustering Reliability and Features Informativeness by Random Permutations , 2007, KES.

[16]  Richard Weber,et al.  Soft clustering - Fuzzy and rough approaches and their extensions and derivatives , 2013, Int. J. Approx. Reason..

[17]  Jon M. Kleinberg,et al.  An Impossibility Theorem for Clustering , 2002, NIPS.

[18]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .