Using Deep Siamese Neural Networks to Speed up Natural Products Research

Natural products (NPs) are an important source of novel disease treatments. A bottleneck in the search for new NPs is structure determination of molecules extracted from biological organisms. One method is to use 2D Nuclear Magnetic Resonance (NMR) spectroscopy, which indicates bonds between nuclei in the compound and hence is the “fingerprint” of the compound. Computing a similarity score between 2D NMR spectra for a novel compound and a compound whose structure is known provides clues to the structure of the novel compound. Standard approaches to this problem do not scale to larger databases of compounds. Here we use deep convolutional Siamese networks to map NMR spectra to a cluster space, where similarity is given by the distance in the space. This approach results in an AUC score that is more than four times better than an approach using LDA.

[1]  D. Newman,et al.  Natural Products as Sources of New Drugs from 1981 to 2014. , 2016, Journal of natural products.

[2]  Zi-Ming Feng,et al.  Dibenzoyl and isoflavonoid glycosides from Sophora flavescens: inhibition of the cytotoxic effect of D-galactosamine on human hepatocyte HL-7702. , 2013, Journal of natural products.

[3]  N. Kruger,et al.  Metabolite fingerprinting and profiling in plants using NMR. , 2004, Journal of experimental botany.

[4]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[6]  Alexander Hinneburg,et al.  Similarity Search for Multi-dimensional NMR-Spectra of Natural Products , 2006, PKDD.

[7]  Yu Zhang,et al.  Bioactive Limonoid Constituents of Munronia henryi. , 2015, Journal of natural products.

[8]  Gang Fu,et al.  PubChem Substance and Compound databases , 2015, Nucleic Acids Res..

[9]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[10]  Hong-Xiang Sun,et al.  Indoline Amide Glucosides from Portulaca oleracea: Isolation, Structure, and DPPH Radical Scavenging Activity. , 2015, Journal of natural products.

[11]  Bradley S Moore,et al.  Lessons from the past and charting the future of marine natural products drug discovery and chemical biology. , 2012, Chemistry & biology.

[12]  Lena Gerwick,et al.  Samholides, Swinholide-Related Metabolites from a Marine Cyanobacterium cf. Phormidium sp. , 2018, The Journal of organic chemistry.

[13]  Darío A. Estrin,et al.  Naturally occurring fluorescence in frogs , 2017, Proceedings of the National Academy of Sciences.

[14]  Per Sunnerhagen,et al.  Flemingins G-O, cytotoxic and antioxidant constituents of the leaves of Flemingia grahamiana. , 2014, Journal of natural products.

[15]  Christoph Steinbeck,et al.  NMRShiftDB-Constructing a Free Chemical Information System with Open-Source Components , 2003, J. Chem. Inf. Comput. Sci..

[16]  Douglas N. Rutledge,et al.  Segmented principal component transform–principal component analysis , 2005 .