Voting-based consensus clustering for combining multiple clusterings of chemical structures

BackgroundAlthough many consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics, few consensus clustering methods have been applied for combining multiple clusterings of chemical structures. It is known that any individual clustering method will not always give the best results for all types of applications. So, in this paper, three voting and graph-based consensus clusterings were used for combining multiple clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster.ResultsThe cumulative voting-based aggregation algorithm (CVAA), cluster-based similarity partitioning algorithm (CSPA) and hyper-graph partitioning algorithm (HGPA) were examined. The F-measure and Quality Partition Index method (QPI) were used to evaluate the clusterings and the results were compared to the Ward’s clustering method. The MDL Drug Data Report (MDDR) dataset was used for experiments and was represented by two 2D fingerprints, ALOGP and ECFP_4. The performance of voting-based consensus clustering method outperformed the Ward’s method using F-measure and QPI method for both ALOGP and ECFP_4 fingerprints, while the graph-based consensus clustering methods outperformed the Ward’s method only for ALOGP using QPI. The Jaccard and Euclidean distance measures were the methods of choice to generate the ensembles, which give the highest values for both criteria.ConclusionsThe results of the experiments show that consensus clustering methods can improve the effectiveness of chemical structures clusterings. The cumulative voting-based aggregation algorithm (CVAA) was the method of choice among consensus clustering methods.

[1]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[2]  F. Brown Chapter 35 – Chemoinformatics: What is it and How does it Impact Drug Discovery. , 1998 .

[3]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[4]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[5]  Jeremy L. Jenkins,et al.  Clustering and Rule-Based Classifications of Chemical Structures Evaluated in the Biological Activity Space , 2007, J. Chem. Inf. Model..

[6]  Mohamed S. Kamel,et al.  On voting-based consensus of cluster ensembles , 2010, Pattern Recognit..

[7]  Miklos Feher,et al.  Consensus scoring for protein-ligand interactions. , 2006, Drug discovery today.

[8]  Naomie Salim,et al.  Ligand expansion in ligand-based virtual screening using relevance feedback , 2012, Journal of Computer-Aided Molecular Design.

[9]  John M. Barnard,et al.  Clustering of chemical structures on the basis of two-dimensional similarity measures , 1992, J. Chem. Inf. Comput. Sci..

[10]  Mahdi Mahfouf,et al.  Clustering Files of Chemical Structures Using the Fuzzy k-Means Clustering Method , 2004, J. Chem. Inf. Model..

[11]  Peter Willett,et al.  Similarity Searching and Clustering of Chemical-Structure Databases Using Molecular Property Data , 1994, J. Chem. Inf. Comput. Sci..

[12]  Mohamed S. Kamel,et al.  Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Naomie Salim,et al.  New Fragment Weighting Scheme for the Bayesian Inference Network in Ligand-Based Virtual Screening , 2011, J. Chem. Inf. Model..

[14]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[15]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[16]  Aurélien Lesnard,et al.  3D Pharmacophore, hierarchical methods, and 5-HT4 receptor binding data , 2008, Journal of enzyme inhibition and medicinal chemistry.

[17]  P. Willett,et al.  Implementation of nonhierarchic cluster analysis methods in chemical information structure search , 1986 .

[18]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[19]  Naomie Salim,et al.  Ligand-Based Virtual Screening Using Bayesian Networks , 2010, J. Chem. Inf. Model..

[20]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tingjun Hou,et al.  ADME evaluation in drug discovery , 2002, Journal of molecular modeling.

[22]  Shashi Shekhar,et al.  Multilevel hypergraph partitioning: application in VLSI domain , 1997, DAC.

[23]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[24]  Ronan Bureau,et al.  Clustering files of chemical structures using the Székely-Rizzo generalization of Ward's method. , 2009, Journal of molecular graphics & modelling.

[25]  P. Willett,et al.  Combining multiple classifications of chemical structures using consensus clustering. , 2012, Bioorganic & medicinal chemistry.

[26]  Naomie Salim,et al.  Information Theory and Voting Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures , 2013, Molecular informatics.

[27]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[28]  Ricardo del Corazón Grau-Ábalo,et al.  Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors , 2011, J. Chem. Inf. Model..

[29]  A. Ghose,et al.  Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods , 1998 .

[30]  A. Ghose,et al.  Atomic Physicochemical Parameters for Three‐Dimensional Structure‐Directed Quantitative Structure‐Activity Relationships I. Partition Coefficients as a Measure of Hydrophobicity , 1986 .

[31]  Ronan Bureau,et al.  3D Pharmacophore, hierarchical methods, and 5-HT4 receptor binding data. , 2008, Journal of enzyme inhibition and medicinal chemistry.

[32]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[33]  Brian Everitt,et al.  Cluster analysis , 1974 .

[34]  Peter Willett,et al.  Analysis of Data Fusion Methods in Virtual Screening: Similarity and Group Fusion , 2006, J. Chem. Inf. Model..

[35]  Fredrik Svensson,et al.  Virtual Screening Data Fusion Using Both Structure- and Ligand-Based Methods , 2012, J. Chem. Inf. Model..

[36]  Anil K. Jain,et al.  A Mixture Model for Clustering Ensembles , 2004, SDM.

[37]  George W. Adamson,et al.  A method for the automatic classification of chemical structures , 1973, Inf. Storage Retr..

[38]  Lei Chen,et al.  ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques. , 2011, Molecular pharmaceutics.

[39]  Peter Willett,et al.  Promoting Access to White Rose Research Papers Enhancing the Effectiveness of Ligand-based Virtual Screening Using Data Fusion , 2022 .