Enhancing the Effectiveness of Ligand‐Based Virtual Screening Using Data Fusion

Data fusion is being increasingly used to combine the outputs of different types of sensors. This paper reviews the application of the approach to ligand-based virtual screening, where the sensors to be combined are functions that score molecules in a database on their likelihood of exhibiting some required biological activity. Much of the literature to date involves the combination of multiple similarity searches, although there is also an increasing interest in the combination of multiple machine-learning techniques. Both approaches are reviewed here, focusing on the extent to which fusion can improve the effectiveness of searching when compared with a single screening mechanism, and on the reasons that have been suggested for the observed performance enhancement.

[1]  D. Steinberg,et al.  Technometrics , 2008 .

[2]  C. E. Peishoff,et al.  A critical assessment of docking programs and scoring functions. , 2006, Journal of medicinal chemistry.

[3]  Xavier Morelli,et al.  GFscore: A General Nonlinear Consensus Scoring Function for High-Throughput Docking , 2006, J. Chem. Inf. Model..

[4]  Miklos Feher,et al.  Consensus scoring for protein-ligand interactions. , 2006, Drug discovery today.

[5]  Dariusz Plewczynski,et al.  Assessing Different Classification Methods for Virtual Screening , 2006, J. Chem. Inf. Model..

[6]  Thierry Kogej,et al.  Multifingerprint Based Similarity Searches for Targeted Class Compound Selection , 2006, J. Chem. Inf. Model..

[7]  Qiang Zhang,et al.  Scaffold hopping through virtual screening using 2D and 3D similarity descriptors: ranking, voting, and consensus scoring. , 2006, Journal of medicinal chemistry.

[8]  Jérôme Hert,et al.  New Methods for Ligand-Based Virtual Screening: Use of Data Fusion and Machine Learning to Enhance the Effectiveness of Similarity Searching , 2006, J. Chem. Inf. Model..

[9]  P. Willett,et al.  Enhancing the effectiveness of similarity-based virtual screening using nearest-neighbor information. , 2005, Journal of medicinal chemistry.

[10]  Robin Taylor,et al.  Comparing protein–ligand docking programs is difficult , 2005, Proteins.

[11]  D. Frank Hsu,et al.  Consensus Scoring Criteria for Improving Enrichment in Virtual Screening , 2005, J. Chem. Inf. Model..

[12]  D. Frank Hsu,et al.  Comparing Rank and Score Combination Methods for Data Fusion in Information Retrieval , 2005, Information Retrieval.

[13]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[14]  R. Glen,et al.  Molecular similarity: a key technique in molecular informatics. , 2004, Organic & biomolecular chemistry.

[15]  Didier Rognan,et al.  Comparative evaluation of eight docking tools for docking and virtual screening accuracy , 2004, Proteins.

[16]  Peter Willett,et al.  Enhancing the Effectiveness of Virtual Screening by Fusing Nearest Neighbor Lists: A Comparison of Similarity Coefficients , 2004, J. Chem. Inf. Model..

[17]  Ophir Frieder,et al.  Fusion of effective retrieval strategies in the same information retrieval system , 2004, J. Assoc. Inf. Sci. Technol..

[18]  Jürgen Bajorath,et al.  Virtual screening methods that complement HTS. , 2004, Combinatorial chemistry & high throughput screening.

[19]  Sonya A. H. McMullen,et al.  Mathematical Techniques in Multisensor Data Fusion (Artech House Information Warfare Library) , 2004 .

[20]  Jérôme Hert,et al.  Comparison of Fingerprint-Based Methods for Virtual Screening Using Multiple Bioactive Reference Structures , 2004, J. Chem. Inf. Model..

[21]  Paul Watson,et al.  Virtual Screening Using Protein-Ligand Docking: Avoiding Artificial Enrichment , 2004, J. Chem. Inf. Model..

[22]  John W. Raymond,et al.  Conditional Probability: A New Fusion Method for Merging Disparate Virtual Screening Results , 2004, J. Chem. Inf. Model..

[23]  Jürgen Bajorath,et al.  Profile Scaling Increases the Similarity Search Performance of Molecular Fingerprints Containing Numerical Descriptors and Structural Keys , 2003, J. Chem. Inf. Comput. Sci..

[24]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[25]  Pierre Acklin,et al.  Similarity Metrics for Ligands Reflecting the Similarity of the Target Proteins , 2003, J. Chem. Inf. Comput. Sci..

[26]  Naomie Salim,et al.  Combination of Fingerprint-Based Similarity Coefficients Using Data Fusion , 2003, J. Chem. Inf. Comput. Sci..

[27]  Peter Willett,et al.  Evaluation of Similarity Measures for Searching the Dictionary of Natural Products Database , 2003, J. Chem. Inf. Comput. Sci..

[28]  Jürgen Bajorath,et al.  Integration of virtual and high-throughput screening , 2002, Nature Reviews Drug Discovery.

[29]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[30]  Paul D Lyne,et al.  Structure-based virtual screening: an overview. , 2002, Drug discovery today.

[31]  Robert P Sheridan,et al.  Why do we need so many chemical similarity search methods? , 2002, Drug discovery today.

[32]  Ruth Nussinov,et al.  Principles of docking: An overview of search algorithms and a guide to scoring functions , 2002, Proteins.

[33]  Gabriele Cruciani,et al.  Suitability of molecular descriptors for database mining. A comparative analysis. , 2002, Journal of medicinal chemistry.

[34]  Joseph S. Verducci,et al.  A Modification of the Jaccard–Tanimoto Similarity Index for Diverse Selection of Chemical Compounds Using Binary Strings , 2002, Technometrics.

[35]  Jonathan W. Essex,et al.  A review of protein-small molecule docking methods , 2002, J. Comput. Aided Mol. Des..

[36]  P Willett,et al.  Grouping of coefficients for the calculation of inter-molecular similarity and dissimilarity using 2D fragment bit-strings. , 2002, Combinatorial chemistry & high throughput screening.

[37]  Tudor I. Oprea,et al.  Virtual Screening in Lead Discovery: A Viewpoint† , 2002, Molecules : A Journal of Synthetic Chemistry and Natural Product Chemistry.

[38]  Bernard F. Buxton,et al.  Drug Design by Machine Learning: Support Vector Machines for Pharmaceutical Data Analysis , 2001, Comput. Chem..

[39]  William B. Langdon,et al.  Data Fusion by Intelligent Classifier Combination , 2001 .

[40]  Nageswara S. V. Rao,et al.  On Fusers that Perform Better than Best Sensor , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Shaomeng Wang,et al.  How Does Consensus Scoring Work for Virtual Library Screening? An Idealized Computer Experiment , 2001, J. Chem. Inf. Comput. Sci..

[42]  F. Jørgensen,et al.  A new concept for multidimensional selection of ligand conformations (MultiSelect) and multidimensional scoring (MultiScore) of protein-ligand binding affinities. , 2001, Journal of medicinal chemistry.

[43]  Darren V. S. Green,et al.  Prediction of Biological Activity for High-Throughput Screening Using Binary Kernel Discrimination , 2001, J. Chem. Inf. Comput. Sci..

[44]  R D Hull,et al.  Mining the chemical quarry with joint chemical probes: an application of latent semantic structure indexing (LaSSI) and TOPOSIM (Dice) to chemical database mining. , 2001, Journal of medicinal chemistry.

[45]  Jürgen Bajorath,et al.  Fingerprint Scaling Increases the Probability of Identifying Molecules with Similar Activity in Virtual Screening Calculations , 2001, J. Chem. Inf. Comput. Sci..

[46]  M Rarey,et al.  Detailed analysis of scoring functions for virtual screening. , 2001, Journal of medicinal chemistry.

[47]  Gerhard Klebe,et al.  Virtual Screening: An Alternative or Complement to High Throughput Screening? , 2000, Springer Netherlands.

[48]  G. Schneider,et al.  Virtual Screening for Bioactive Molecules , 2000 .

[49]  J. Mestres,et al.  Similarity versus docking in 3D virtual screening , 2000 .

[50]  P. Willett,et al.  Combination of molecular similarity measures using data fusion , 2000 .

[51]  Paul B. Kantor,et al.  Predicting the effectiveness of naïve data fusion on the basis of system characteristics , 2000, J. Am. Soc. Inf. Sci..

[52]  Robert P. Sheridan,et al.  The Centroid Approximation for Mixtures: Calculating Similarity and Deriving Structure-Activity Relationships , 2000, J. Chem. Inf. Comput. Sci..

[53]  X Fradera,et al.  Similarity‐driven flexible ligand docking , 2000, Proteins.

[54]  M. Murcko,et al.  Consensus scoring: A method for obtaining improved hit rates from docking databases of three-dimensional structures into proteins. , 1999, Journal of medicinal chemistry.

[55]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[56]  Darren R. Flower,et al.  On the Properties of Bit String-Based Measures of Chemical Similarity , 1998, J. Chem. Inf. Comput. Sci..

[57]  Pramod K. Varshney,et al.  Multisensor Data Fusion , 1997, IEA/AIE.

[58]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[59]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[60]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[61]  Yvonne C. Martin,et al.  The Information Content of 2D and 3D Structural Descriptors Relevant to Ligand-Receptor Binding , 1997, J. Chem. Inf. Comput. Sci..

[62]  Peter Willett,et al.  Similarity Searching in Files of Three-Dimensional Chemical Structures: Evaluation of the EVA Descriptor and Combination of Rankings Using Data Fusion , 1997, J. Chem. Inf. Comput. Sci..

[63]  Josef Kittler,et al.  Combining classifiers , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[64]  David Weininger,et al.  Stigmata: An Algorithm To Determine Structural Commonalities in Diverse Datasets , 1996, J. Chem. Inf. Comput. Sci..

[65]  M Karplus,et al.  Evolutionary optimization in quantitative structure-activity relationship: an application of genetic neural networks. , 1996, Journal of medicinal chemistry.

[66]  Hideyuki Masui,et al.  SPECTRA: A Spectral Information Management System Featuring a Novel Combined Search Function , 1996, J. Chem. Inf. Comput. Sci..

[67]  Robert P. Sheridan,et al.  Chemical Similarity Using Physiochemical Property Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[68]  Robert P. Sheridan,et al.  Chemical Similarity Using Geometric Atom Pair Descriptors , 1996, J. Chem. Inf. Comput. Sci..

[69]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[70]  Ajay ON BETTER GENERALIZATION BY COMBINING TWO OR MORE MODELS : A QUANTITATIVE STRUCTURE-ACTIVITY RELATIONSHIP EXAMPLE USING NEURAL NETWORKS , 1994 .

[71]  Lawrence A. Klein,et al.  Sensor and Data Fusion Concepts and Applications , 1993 .

[72]  D. L. Hall,et al.  Mathematical Techniques in Multisensor Data Fusion , 1992 .

[73]  Robert D. Clark,et al.  Rank‐order analysis for robust multiresponse, multiblock comparisons: Evaluation of herbicide interactions , 1991 .

[74]  J. Llinas,et al.  Multisensor Data Fusion , 1990 .

[75]  W. Bruce Croft,et al.  Retrieving documents by plausible inference: An experimental study , 1989, Inf. Process. Manag..

[76]  Peter Willett,et al.  Implementation of nearest-neighbor searching in an online chemical structure search system , 1986, J. Chem. Inf. Comput. Sci..

[77]  C E Berkoff,et al.  Substructural analysis. A novel approach to the problem of drug design. , 1974, Journal of medicinal chemistry.

[78]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[79]  Miklos Feher,et al.  The Use of Consensus Scoring in Ligand-Based Virtual Screening , 2006, J. Chem. Inf. Model..

[80]  Shuichi Hirono,et al.  Comparison of Consensus Scoring Strategies for Evaluating Computational Models of Protein-Ligand Complexes , 2006, J. Chem. Inf. Model..

[81]  Gisbert Schneider,et al.  Multi-space classification for predicting GPCR-ligands , 2005, Molecular Diversity.

[82]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[83]  R. Clark,et al.  Consensus scoring for ligand/protein interactions. , 2002, Journal of molecular graphics & modelling.

[84]  P. Willett,et al.  Effectiveness of retrieval in similarity searches of chemical databases: a review of performance measures. , 2000, Journal of molecular graphics & modelling.

[85]  I. R. Goodman,et al.  Mathematics of Data Fusion , 1997 .

[86]  P. Willett,et al.  A Fast Algorithm For Selecting Sets Of Dissimilar Molecules From Large Chemical Databases , 1995 .

[87]  Marvin Johnson,et al.  Concepts and applications of molecular similarity , 1990 .

[88]  Peter Willett,et al.  Comparison of fragment weighting schemes for substructural analysis , 1989 .

[89]  Larry B. Wallnau,et al.  Statistics for the Behavioral Sciences , 1985 .

[90]  G. Salton,et al.  Research and Development in Information Retrieval , 1982, Lecture Notes in Computer Science.