Similarity-Based Virtual Screening Using Bayesian Inference Network: Enhanced Search Using 2D Fingerprints and Multiple Reference Structures

It has been known that different reference structure retrieve different sets of structures. Recent works in similarity searching have suggested that significant improvements in retrieval effectiveness can be achieved by combining results from different reference structures. One of an important characteristic of the Bayesian inference network (BIN) model is that permits the combining of multiple reference structures. In this paper we introduce a formal inference net model to directly combine the contributions of multiple reference structures, and propose a novel approach to the combination of information from various reference structures. The inference net model of similarity, which was designed from this point of view, treats similarity searching as an evidential reasoning process where multiple sources of evidence about target structure are combined to estimate similarity scores. In this paper, we have compared BIN with other similarity searching methods when multiple bioactive reference structures are available. Six different 2D fingerprints were used in combination with data fusion (DF) and nearest neighbor (NN) approaches as search tools and also as descriptors for BIN. Our empirical results show that the BIN consistently outperformed all conventional approaches such as DF and NN, regardless of the fingerprints that were tested. The superiority of BIN over conventional approaches is ascribed to the fact that BIN understands the content of the descriptors of the structures and references and used this understanding to infer the direct relationship between structures and references.

[1]  Naomie Salim,et al.  Similarity‐Based Virtual Screening with a Bayesian Inference Network , 2009, ChemMedChem.

[2]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[3]  George Papadatos,et al.  Evaluation of machine-learning methods for ligand-based virtual screening , 2007, J. Comput. Aided Mol. Des..

[4]  Jürgen Bajorath,et al.  Profile Scaling Increases the Similarity Search Performance of Molecular Fingerprints Containing Numerical Descriptors and Structural Keys , 2003, J. Chem. Inf. Comput. Sci..

[5]  David Weininger,et al.  Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets , 1996, J. Chem. Inf. Comput. Sci..

[6]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[7]  Peter Willett,et al.  Virtual Screening Using Binary Kernel Discrimination: Analysis of Pesticide Data , 2006, J. Chem. Inf. Model..

[8]  Mark Johnson,et al.  Algorithm for Naming Molecular Equivalence Classes Represented by Labeled Pseudographs , 2001, J. Chem. Inf. Comput. Sci..

[9]  Pierre Acklin,et al.  Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins , 2003, J. Chem. Inf. Comput. Sci..

[10]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[11]  Jürgen Bajorath,et al.  Fingerprint Scaling Increases the Probability of Identifying Molecules with Similar Activity in Virtual Screening Calculations , 2001, J. Chem. Inf. Comput. Sci..

[12]  P. Willett Searching techniques for databases of two- and three-dimensional chemical structures. , 2005, Journal of medicinal chemistry.

[13]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[14]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[15]  Peter Willett,et al.  Enhancing the Effectiveness of Virtual Screening by Fusing Nearest Neighbor Lists: A Comparison of Similarity Coefficients , 2004, J. Chem. Inf. Model..

[16]  Luis M. de Campos,et al.  The BNR model: foundations and performance of a Bayesian network-based retrieval model , 2003, Int. J. Approx. Reason..

[17]  Andreas Bender,et al.  Molecular Similarity Searching Using Atom Environments, Information-Based Feature Selection, and a Naïve Bayesian Classifier , 2004, J. Chem. Inf. Model..

[18]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[19]  Thomas Gärtner,et al.  Support-Vector-Machine-Based Ranking Significantly Improves the Effectiveness of Similarity Searching Using 2D Fingerprints and Multiple Reference Compounds , 2008, J. Chem. Inf. Model..

[20]  Xiaoyang Xia,et al.  Classification of kinase inhibitors using a Bayesian model. , 2004, Journal of medicinal chemistry.

[21]  Robert P. Sheridan,et al.  The Centroid Approximation for Mixtures: Calculating Similarity and Deriving Structure-Activity Relationships , 2000, J. Chem. Inf. Comput. Sci..

[22]  P. Willett,et al.  Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures. , 2004, Organic & biomolecular chemistry.

[23]  Mark Johnson,et al.  Using Molecular Equivalence Numbers To Visually Explore Structural Features that Distinguish Chemical Libraries , 2002, J. Chem. Inf. Comput. Sci..

[24]  G. Bemis,et al.  The properties of known drugs. 1. Molecular frameworks. , 1996, Journal of medicinal chemistry.