GPU-accelerated Chemical Similarity Assessment for Large Scale Databases

The assessment of chemical similarity between molecules is a basic operation in chemoinformatics, a computational area concerning with the manipulation of chemical structural information. Comparing molecules is the basis for a wide range of applications such as searching in chemical databases, training prediction models for virtual screening or aggregating clusters of similar compounds. However, currently available multimillion databases represent a challenge for conventional chemoinformatics algorithms raising the necessity for faster similarity methods. In this paper, we extensively analyze the advantages of using many-core architectures for calculating some commonly-used chemical similarity coefficients such as Tanimoto, Dice or Cosine. Our aim is to provide a wide-breath proof-of-concept regarding the usefulness of GPU architectures to chemoinformatics, a class of computing problems still uncovered. In our work, we present a general GPU algorithm for all-to-all chemical comparisons considering both binary fingerprints and floating point descriptors as molecule representation. Subsequently, we adopt optimization techniques to minimize global memory accesses and to further improve efficiency. We test the proposed algorithm on different experimental setups, a laptop with a low-end GPU and a desktop with a more performant GPU. In the former case, we obtain a 4-to-6-fold speed-up over a single-core implementation for fingerprints and a 4-to-7-fold speed-up for descriptors. In the latter case, we respectively obtain a 195-to-206-fold speed-up and a 100-to-328-fold speed-up.

[1]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[2]  Klaus Schulten,et al.  Accelerating Molecular Modeling Applications with GPU Computing , 2009 .

[3]  David S Wishart,et al.  Introduction to cheminformatics. , 2007, Current protocols in bioinformatics.

[4]  Julian R. Ullmann,et al.  An Algorithm for Subgraph Isomorphism , 1976, J. ACM.

[5]  Darko Butina,et al.  Unsupervised Data Base Clustering Based on Daylight's Fingerprint and Tanimoto Similarity: A Fast and Automated Way To Cluster Small and Large Data Sets , 1999, J. Chem. Inf. Comput. Sci..

[6]  Giorgio Valle,et al.  CUDA compatible GPU cards as efficient hardware accelerators for Smith-Waterman sequence alignment , 2008, BMC Bioinformatics.

[7]  Peter Willett,et al.  Comparison of Ranking Methods for Virtual Screening in Lead-Discovery Programs , 2003, J. Chem. Inf. Comput. Sci..

[8]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[9]  G. Karypis,et al.  Acyclic Subgraph Based Descriptor Spaces for Chemical Compound Retrieval and Classification , 2006 .

[10]  Raman Sharma,et al.  ElectroShape: fast molecular similarity calculations incorporating shape, chirality and electrostatics , 2010, J. Comput. Aided Mol. Des..

[11]  Dong Xu,et al.  Advancements in Molecular Dynamics Simulations of Biomolecules on Graphical Processing Units , 2010 .

[12]  W. Graham Richards,et al.  Ultrafast shape recognition to search compound databases for similar molecular shapes , 2007, J. Comput. Chem..

[13]  Vijay S. Pande,et al.  SIML: A Fast SIMD Algorithm for Calculating LINGO Chemical Similarities on GPUs and CPUs , 2010, J. Chem. Inf. Model..

[14]  Ji-Bo Wang,et al.  GPU Accelerated Support Vector Machines for Mining High-Throughput Screening Data , 2009, J. Chem. Inf. Model..

[15]  Michael K. Gilson,et al.  Virtual Screening of Molecular Databases Using a Support Vector Machine , 2005, J. Chem. Inf. Model..

[16]  J. Andrew Grant,et al.  A fast method of molecular shape comparison: A simple application of a Gaussian description of molecular shape , 1996, J. Comput. Chem..

[17]  Lorenz C. Blum,et al.  970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. , 2009, Journal of the American Chemical Society.

[18]  Douglas M. Freimuth,et al.  Evaluating the Jaccard-Tanimoto Index on Multi-core Architectures , 2009, ICCS.