Graph‐Based Consensus Clustering for Combining Multiple Clusterings of Chemical Structures

Consensus clustering methods have been successfully used for combining multiple classifiers in many areas such as machine learning, applied statistics, pattern recognition and bioinformatics. In this paper, consensus clustering is used for combining the clusterings of chemical structures to enhance the ability of separating biologically active molecules from inactive ones in each cluster. Two graph‐based consensus clustering methods were examined. The Quality Partition Index method (QPI) was used to evaluate the clusterings and the results were compared to the Ward’s clustering method. Two homogeneous and heterogeneous subsets DS1–DS2 of MDL Drug Data Report database (MDDR) were used for experiments and represented by two 2D fingerprints. The results, obtained by a combination of multiple runs of an individual clustering and a single run of multiple individual clusterings, showed that graph‐based consensus clustering methods can improve the effectiveness of chemical structures clusterings.

[1]  Anil K. Jain,et al.  Combining multiple weak clusterings , 2003, Third IEEE International Conference on Data Mining.

[2]  A. Ghose,et al.  Prediction of Hydrophobic (Lipophilic) Properties of Small Organic Molecules Using Fragmental Methods: An Analysis of ALOGP and CLOGP Methods , 1998 .

[3]  Lei Chen,et al.  ADME evaluation in drug discovery. 10. Predictions of P-glycoprotein inhibitors using recursive partitioning and naive Bayesian classification techniques. , 2011, Molecular pharmaceutics.

[4]  Peter Willett,et al.  Combination Rules for Group Fusion in Similarity‐Based Virtual Screening , 2010, Molecular informatics.

[5]  Mahdi Mahfouf,et al.  Clustering Files of Chemical Structures Using the Fuzzy k-Means Clustering Method , 2004, J. Chem. Inf. Model..

[6]  Peter Willett,et al.  Similarity Searching and Clustering of Chemical-Structure Databases Using Molecular Property Data , 1994, J. Chem. Inf. Comput. Sci..

[7]  Aurélien Lesnard,et al.  3D Pharmacophore, hierarchical methods, and 5-HT4 receptor binding data , 2008, Journal of enzyme inhibition and medicinal chemistry.

[8]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[9]  Ronan Bureau,et al.  Clustering files of chemical structures using the Székely-Rizzo generalization of Ward's method. , 2009, Journal of molecular graphics & modelling.

[10]  P. Willett,et al.  Combining multiple classifications of chemical structures using consensus clustering. , 2012, Bioorganic & medicinal chemistry.

[11]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[12]  George W. Adamson,et al.  A method for the automatic classification of chemical structures , 1973, Inf. Storage Retr..

[13]  Sandro Vega-Pons,et al.  A Survey of Clustering Ensemble Algorithms , 2011, Int. J. Pattern Recognit. Artif. Intell..

[14]  Miklos Feher,et al.  Consensus scoring for protein-ligand interactions. , 2006, Drug discovery today.

[15]  Naomie Salim,et al.  Ligand expansion in ligand-based virtual screening using relevance feedback , 2012, Journal of Computer-Aided Molecular Design.

[16]  Yvonne C. Martin,et al.  Use of Structure-Activity Data To Compare Structure-Based Clustering Methods and Descriptors for Use in Compound Selection , 1996, J. Chem. Inf. Comput. Sci..

[17]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[18]  David J Wild,et al.  Challenges for chemoinformatics education in drug discovery. , 2006, Drug discovery today.

[19]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[20]  F. Brown Chapter 35 – Chemoinformatics: What is it and How does it Impact Drug Discovery. , 1998 .

[21]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[23]  A. Ghose,et al.  Atomic physicochemical parameters for three dimensional structure directed quantitative structure‐activity relationships III: Modeling hydrophobic interactions , 1988 .

[24]  Naomie Salim,et al.  New Fragment Weighting Scheme for the Bayesian Inference Network in Ligand-Based Virtual Screening , 2011, J. Chem. Inf. Model..

[25]  Peter Willett,et al.  Promoting Access to White Rose Research Papers Enhancing the Effectiveness of Ligand-based Virtual Screening Using Data Fusion , 2022 .

[26]  Peter Willett,et al.  Analysis of Data Fusion Methods in Virtual Screening: Similarity and Group Fusion , 2006, J. Chem. Inf. Model..

[27]  Fredrik Svensson,et al.  Virtual Screening Data Fusion Using Both Structure- and Ligand-Based Methods , 2012, J. Chem. Inf. Model..

[28]  Naomie Salim,et al.  Ligand-Based Virtual Screening Using Bayesian Networks , 2010, J. Chem. Inf. Model..

[29]  A. Ghose,et al.  Atomic Physicochemical Parameters for Three‐Dimensional Structure‐Directed Quantitative Structure‐Activity Relationships I. Partition Coefficients as a Measure of Hydrophobicity , 1986 .

[30]  Johnz Willett Similarity and Clustering in Chemical Information Systems , 1987 .

[31]  Ricardo del Corazón Grau-Ábalo,et al.  Comparison of Combinatorial Clustering Methods on Pharmacological Data Sets Represented by Machine Learning-Selected Real Molecular Descriptors , 2011, J. Chem. Inf. Model..