Current status and prospects of computational resources for natural product dereplication: a review

Research in natural products has always enhanced drug discovery by providing new and unique chemical compounds. However, recently, drug discovery from natural products is slowed down by the increasing chance of re-isolating known compounds. Rapid identification of previously isolated compounds in an automated manner, called dereplication, steers researchers toward novel findings, thereby reducing the time and effort for identifying new drug leads. Dereplication identifies compounds by comparing processed experimental data with those of known compounds, and so, diverse computational resources such as databases and tools to process and compare compound data are necessary. Automating the dereplication process through the integration of computational resources has always been an aspired goal of natural product researchers. To increase the utilization of current computational resources for natural products, we first provide an overview of the dereplication process, and then list useful resources, categorizing into databases, methods and software tools and further explaining them from a dereplication perspective. Finally, we discuss the current challenges to automating dereplication and proposed solutions.

[1]  John P. Overington,et al.  ChEMBL: a large-scale bioactivity database for drug discovery , 2011, Nucleic Acids Res..

[2]  Yi-Zeng Liang,et al.  Baseline correction using adaptive iteratively reweighted penalized least squares. , 2010, The Analyst.

[3]  Shigeko Seki Real-time recognition of two-dimensional tapes by cellular automata , 1979, Inf. Sci..

[4]  Qingsong Xu,et al.  Rcpi: R/Bioconductor package to generate various descriptors of proteins, compounds and their interactions , 2015, Bioinform..

[5]  Wolfgang Robien,et al.  CSEARCH: a computer program for identification of organic compounds and fully automated assignment of carbon-13 nuclear magnetic resonance spectra , 1985, J. Chem. Inf. Comput. Sci..

[6]  Johann Gasteiger,et al.  Prediction of 1H NMR chemical shifts using neural networks. , 2002, Analytical chemistry.

[7]  Chao Yang,et al.  Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis , 2009, BMC Bioinformatics.

[8]  S. Wijmenga,et al.  NMR and pattern recognition methods in metabolomics: from data acquisition to biomarker discovery: a review. , 2012, Analytica chimica acta.

[9]  Morton E. Munk,et al.  INFERCNMR: A 13C NMR Interpretive Library Search System , 2012, J. Chem. Inf. Model..

[10]  Wolfgang Bermel,et al.  Using pure shift HSQC to characterize microgram samples of drug metabolites , 2014 .

[11]  Jens Christian Frisvad,et al.  Dereplication of microbial natural products by LC-DAD-TOFMS. , 2011, Journal of natural products.

[12]  David M. Rocke,et al.  Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis , 2008, BMC Bioinformatics.

[13]  Robert Powers,et al.  MVAPACK: A Complete Data Handling Package for NMR Metabolomics , 2014, ACS chemical biology.

[14]  Peter Ertl,et al.  JSME: a free molecule editor in JavaScript , 2013, Journal of Cheminformatics.

[15]  J. Markley,et al.  rNMR: open source software for identifying and quantifying metabolites in NMR spectra , 2009, Magnetic resonance in chemistry : MRC.

[16]  Alexander Hinneburg,et al.  Duplicate detection of 2D-NMR Spectra , 2007, J. Integr. Bioinform..

[17]  Mark Harrison,et al.  Adaptive binning: An improved binning method for metabolomics data using the undecimated wavelet transform , 2007 .

[18]  Simon K. Kearsley,et al.  Using similarity searches over databases of estimated 13C NMR spectra for structure identification of natural product compounds , 1995 .

[19]  Luc Patiny,et al.  Structural Analysis from Classroom to Laboratory , 2012 .

[20]  Mathias Dunkel,et al.  SuperNatural: a searchable database of available natural compounds , 2005, Nucleic Acids Res..

[21]  João Aires-de-Sousa,et al.  The Impact of Available Experimental Data on the Prediction of 1H NMR Chemical Shifts by Neural Networks , 2004, J. Chem. Inf. Model..

[22]  Mikhail E. Elyashberg,et al.  Development of a fast and accurate method of 13 C NMR chemical shift prediction , 2009 .

[23]  Riadh Hammami,et al.  PhytAMP: a database dedicated to antimicrobial plant peptides , 2008, Nucleic Acids Res..

[24]  Xueguang Shao,et al.  A general approach to derivative calculation using wavelet transform , 2003 .

[25]  Christoph Steinbeck,et al.  NMRShiftDB -- compound identification and structure elucidation support through a free community-built web database. , 2004, Phytochemistry.

[26]  Nick Spadaccini,et al.  Extensions to the STAR File Syntax , 2012, J. Chem. Inf. Model..

[27]  Jonathan Goodman,et al.  Computer Software Review: Reaxys , 2009, J. Chem. Inf. Model..

[28]  D. E. Brown Fully Automated Baseline Correction of 1D and 2D NMR Spectra Using Bernstein Polynomials , 1995 .

[29]  R K Julian,et al.  A method for quantitatively differentiating crude natural extracts using high-performance liquid chromatography-electrospray mass spectrometry. , 1998, Analytical chemistry.

[30]  D. Newman,et al.  Natural products as sources of new drugs over the last 25 years. , 2007, Journal of natural products.

[31]  Sivaraman Dandapani,et al.  Grand challenge commentary: Accessing new chemical space for 'undruggable' targets. , 2010, Nature chemical biology.

[32]  Calvin Yu-Chian Chen,et al.  TCM Database@Taiwan: The World's Largest Traditional Chinese Medicine Database for Drug Screening In Silico , 2011, PloS one.

[33]  Nenad Trinajstić,et al.  Complexity of Molecules. , 2000 .

[34]  Bart Goethals,et al.  An integrated workflow for robust alignment and simplified quantitative analysis of NMR spectrometry data , 2011, BMC Bioinformatics.

[35]  Tao Wang,et al.  Automics: an integrated platform for NMR-based metabonomics spectral processing and data analysis , 2009, BMC Bioinformatics.

[36]  Wensheng Cai,et al.  Wavelet transform and its applications in high performance liquid chromatography (HPLC) analysis , 1999 .

[37]  Robert W. Field,et al.  Baseline subtraction using robust local regression estimation , 2001 .

[38]  Mathias Dunkel,et al.  Natural Products: Sources and Databases , 2006 .

[39]  Green De Quantitation of cannabinoids in biological specimens using probability based matching GC/MS. , 1976 .

[40]  J. Mo,et al.  Baseline correction by improved iterative polynomial fitting with automatic threshold , 2006 .

[41]  Robert J Lancashire,et al.  The JSpecView Project: an Open Source Java viewer and converter for JCAMP-DX, and XML spectral data files , 2007, Chemistry Central journal.

[42]  Pascal Amoa Onguéné,et al.  CamMedNP: Building the Cameroonian 3D structural natural products database for virtual screening , 2013, BMC Complementary and Alternative Medicine.

[43]  Andrés M. Castillo,et al.  A new method for the comparison of 1H NMR predictors based on tree-similarity of spectra , 2014, Journal of Cheminformatics.

[44]  Ari Rantanen,et al.  FiD: a software for ab initio structural identification of product ions from tandem mass spectrometric data. , 2008, Rapid communications in mass spectrometry : RCM.

[45]  Nick Spadaccini,et al.  The STAR File: detailed specifications , 1994, J. Chem. Inf. Comput. Sci..

[46]  Roberto Therón,et al.  NAPROC-13: a database for the dereplication of natural product mixtures in bioassay-guided protocols , 2007, Bioinform..

[47]  A. Schuffenhauer,et al.  Charting biologically relevant chemical space: a structural classification of natural products (SCONP). , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[48]  Tao Jiang,et al.  ChemmineR: a compound mining framework for R , 2008, Bioinform..

[49]  Liu Xianming,et al.  A Time Petri Net Extended with Price Information , 2007 .

[50]  Aiqin Fang,et al.  DISCO: distance and spectrum correlation optimization alignment for two-dimensional gas chromatography time-of-flight mass spectrometry-based metabolomics. , 2010, Analytical chemistry.

[51]  J. Irwin,et al.  ZINC ? A Free Database of Commercially Available Compounds for Virtual Screening. , 2005 .

[52]  R. Hammami,et al.  BACTIBASE second release: a database and tool platform for bacteriocin characterization , 2010, BMC Microbiology.

[53]  João Aires-de-Sousa,et al.  Structure-Based Predictions of 1H NMR Chemical Shifts Using Feed-Forward Neural Networks , 2004, J. Chem. Inf. Model..

[54]  Hartmut Laatsch,et al.  Evolving trends in the dereplication of natural product extracts: new methodology for rapid, small-scale investigation of natural product extracts. , 2008, Journal of natural products.

[55]  Jean-Luc Wolfender,et al.  Advances in Techniques for Profiling Crude Extracts and for the Rapid Identificationof Natural Products: Dereplication, Quality Control and Metabolomics , 2010 .

[56]  John Buckingham,et al.  Dictionary of natural products , 2014 .

[57]  Chen Chen,et al.  Selective iteratively reweighted quantile regression for baseline correction , 2014, Analytical and Bioanalytical Chemistry.

[58]  Ferenc Csizmadia JChem: Java Applets and Modules Supporting Chemical Database Handling from Web Browsers , 2000, J. Chem. Inf. Comput. Sci..

[59]  C T Peng,et al.  Prediction of retention indices. V. Influence of electronic effects and column polarity on retention index. , 1991, Journal of chromatography. A.

[60]  Tim J. Stevens,et al.  Metabolomics Project : a fast protocol for metabolite identification by 2 D-NMR , 2011 .

[61]  Tadeusz F Molinski,et al.  Microscale methodology for structure elucidation of natural products. , 2010, Current opinion in biotechnology.

[62]  Michael L. Raymer,et al.  Dynamic adaptive binning: an improved quantification technique for NMR spectroscopic data , 2011, Metabolomics.

[63]  Ruchi Verma,et al.  A Support Vector Machine based method to distinguish proteobacterial proteins from eukaryotic plant proteins , 2012, BMC Bioinformatics.

[64]  Thomas L. Isenhour,et al.  The Evaluation of Mass Spectral Search Algorithms , 1979, J. Chem. Inf. Comput. Sci..

[65]  D Brynn Hibbert,et al.  Comparison of spectra using a Bayesian approach. An argument using oil spills as an example. , 2005, Analytical chemistry.

[66]  Fang Chen,et al.  A new automatic baseline correction method based on iterative method. , 2012, Journal of magnetic resonance.

[67]  P. Lampen,et al.  JCAMP-DX for NMR , 1993 .

[68]  Matthias Müller-Hannemann,et al.  In silico fragmentation for computer assisted identification of metabolite mass spectra , 2010, BMC Bioinformatics.

[69]  S. Grzesiek,et al.  NMRPipe: A multidimensional spectral processing system based on UNIX pipes , 1995, Journal of biomolecular NMR.

[70]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[71]  Adriano D Andricopulo,et al.  Development of a natural products database from the biodiversity of Brazil. , 2013, Journal of natural products.

[72]  Anne Osbourn,et al.  Natural products : discourse, diversity and design , 2014 .

[73]  C. Motti,et al.  FTICR-MS and LC-UV/MS-SPE-NMR applications for the rapid dereplication of a crude extract from the sponge Ianthella flabelliformis. , 2009, Journal of natural products.

[74]  Jeff A. Bilmes,et al.  Spectrum Identification using a Dynamic Bayesian Network Model of Tandem Mass Spectra , 2012, UAI.

[75]  John W. Blunt,et al.  Is There an Ideal Database for Natural Products Research , 2014 .

[76]  János Bérdy,et al.  Bioactive microbial metabolites. , 2005, The Journal of antibiotics.

[77]  Asaph Aharoni,et al.  Evaluation of peak picking quality in LC-MS metabolomics data. , 2010, Analytical chemistry.

[78]  John W. Blunt,et al.  The Role of Databases in Marine Natural Products Research , 2012 .

[79]  Emmanuel Mikros,et al.  Recent advances and new strategies in the NMR-based identification of natural products. , 2014, Current opinion in biotechnology.

[80]  J. Vederas,et al.  Drug Discovery and Natural Products: End of an Era or an Endless Frontier? , 2009, Science.

[81]  Robert P Bywater,et al.  Membrane-spanning peptides and the origin of life. , 2009, Journal of theoretical biology.

[82]  David S. Wishart,et al.  CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra , 2014, Nucleic Acids Res..

[83]  Wayne Boucher,et al.  The CCPN data model for NMR spectroscopy: Development of a software pipeline , 2005, Proteins.

[84]  Andrés M. Castillo,et al.  Fast and shift-insensitive similarity comparisons of NMR using a tree-representation of spectra , 2013 .

[85]  Fred W. McLafferty,et al.  Probability based matching of mass spectra. Rapid identification of specific compounds in mixtures , 1974 .

[86]  Rafael Brüschweiler,et al.  Web server based complex mixture analysis by NMR. , 2008, Analytical chemistry.

[87]  Imhoi Koo,et al.  Wavelet- and Fourier-transform-based spectrum similarity approaches to compound identification in gas chromatography/mass spectrometry. , 2011, Analytical chemistry.

[88]  Alexander Hinneburg,et al.  Fast Approximate Duplicate Detection for 2D-NMR Spectra , 2007, DILS.

[89]  S. Bryant,et al.  PubChem as a public resource for drug discovery. , 2010, Drug discovery today.

[90]  Rovshan G Sadygov,et al.  Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book , 2004, Nature Methods.

[91]  J. Beutler,et al.  Natural Products as a Foundation for Drug Discovery , 2009, Current protocols in pharmacology.

[92]  Imhoi Koo,et al.  Comparative analysis of mass spectral matching-based compound identification in gas chromatography-mass spectrometry. , 2013, Journal of chromatography. A.

[93]  Bryan A. Hanson,et al.  ChemoSpec : An R Package for Chemometric Analysis of Spectroscopic Data and Chromatograms ( Package Version 1 . 61-3 ) , 2013 .

[94]  Mikhail E. Elyashberg,et al.  Identification and structure elucidation by NMR spectroscopy , 2015 .

[95]  Fred W. McLafferty,et al.  Reliability ranking and scaling improvements to the probability based matching system for unknown mass spectra , 1985 .

[96]  Stefan Wetzel,et al.  Natural-product-derived fragments for fragment-based ligand discovery , 2012, Nature Chemistry.

[97]  F. Koehn,et al.  The evolving role of natural products in drug discovery , 2005, Nature Reviews Drug Discovery.

[98]  Alan R. Katritzky,et al.  Prediction of Ultraviolet Spectral Absorbance Using Quantitative Structure-Property Relationships , 2002, J. Chem. Inf. Comput. Sci..

[99]  Dean J. Tantillo,et al.  Computational prediction of 1H and 13C chemical shifts: a useful tool for natural product, mechanistic, and synthetic organic chemistry. , 2012, Chemical reviews.

[100]  Imhoi Koo,et al.  Compound identification using partial and semipartial correlations for gas chromatography-mass spectrometry data. , 2012, Analytical chemistry.

[101]  Nigel W. Hardy,et al.  Proposed reporting requirements for the description of NMR-based metabolomics experiments , 2007, Metabolomics.

[102]  Nuno Bandeira,et al.  Mass spectral molecular networking of living microbial colonies , 2012, Proceedings of the National Academy of Sciences.

[103]  William F Reynolds,et al.  Using NMR to identify and characterize natural products. , 2013, Natural product reports.

[104]  Antony J. Williams,et al.  Computer‐assisted structure elucidation of natural products with limited 2D NMR data: application of the StrucEluc system , 2003 .

[105]  Svetoslav H. Slavov,et al.  Quantitative correlation of physical and chemical properties with chemical structure: utility for prediction. , 2010, Chemical reviews.

[106]  Yan Liu,et al.  Prediction of chromatographic relative retention time of polychlorinated biphenyls from the molecular electronegativity distance vector. , 2006, Journal of separation science.

[107]  T. Ebbels,et al.  Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts , 2007, Nature Protocols.

[108]  Antony Williams,et al.  Empirical and DFT GIAO quantum‐mechanical methods of 13C chemical shifts prediction: competitors or collaborators? , 2010, Magnetic resonance in chemistry : MRC.

[109]  Christoph Steinbeck,et al.  The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013 , 2012, Nucleic Acids Res..

[110]  Hu Mei,et al.  Estimation and prediction on retention times of components from essential oil of Paulownia tomentosa flowers by molecular electronegativity-distance vector (MEDV) , 2008 .

[111]  David J Newman,et al.  Natural products as sources of new drugs over the 30 years from 1981 to 2010. , 2012, Journal of natural products.

[112]  Changyu Shen,et al.  Model-based peak alignment of metabolomic profiling from comprehensive two-dimensional gas chromatography mass spectrometry , 2012, BMC Bioinformatics.

[113]  Henry S. Rzepa,et al.  SPECTRa: The Deposition and Validation of Primary Chemistry Research Data in Digital Repositories , 2008, J. Chem. Inf. Model..

[114]  Rafael Brüschweiler,et al.  Robust deconvolution of complex mixtures by covariance TOCSY spectroscopy. , 2007, Angewandte Chemie.

[115]  J. Yates,et al.  Similarity among tandem mass spectra from proteomic experiments: detection, significance, and utility. , 2003, Analytical chemistry.

[116]  A. Valencia,et al.  Text Mining for Drugs and Chemical Compounds: Methods, Tools and Applications , 2011, Molecular informatics.

[117]  Matthias Witt,et al.  Accelerated dereplication of crude extracts using HPLC-PDA-MS-SPE-NMR: quinolinone alkaloids of Haplophyllum acutifolium. , 2009, Phytochemistry.

[118]  Nicholas H Oberlies,et al.  High-resolution MS, MS/MS, and UV database of fungal secondary metabolites as a dereplication protocol for bioactive natural products. , 2013, Journal of natural products.

[119]  Stuart L. Schreiber,et al.  Small molecules of different origins have distinct distributions of structural complexity that correlate with protein-binding profiles , 2010, Proceedings of the National Academy of Sciences.

[120]  Manuel Martín-Pastor,et al.  A new general-purpose fully automatic baseline-correction procedure for 1D and 2D NMR data. , 2006, Journal of magnetic resonance.

[121]  Ł. Komsta,et al.  Comparison of Several Methods of Chromatographic Baseline Removal with a New Approach Based on Quantile Regression , 2011, Chromatographia.

[122]  Sydney R. Hall,et al.  The STAR file: a new format for electronic data transfer and archiving , 1991, J. Chem. Inf. Comput. Sci..

[123]  M. Billeter,et al.  Automated peak picking and peak integration in macromolecular NMR spectra using AUTOPSY. , 1998, Journal of magnetic resonance.

[124]  Takeaki Uno,et al.  Chemical Structure Elucidation from 13C NMR Chemical Shifts: Efficient Data Processing Using Bipartite Matching and Maximal Clique Algorithms , 2014, J. Chem. Inf. Model..

[125]  Bing Wang,et al.  An optimal peak alignment for comprehensive two-dimensional gas chromatography mass spectrometry using mixture similarity measure , 2011, Bioinform..

[126]  Martin Serrano,et al.  Nucleic Acids Research Advance Access published October 18, 2007 ChemBank: a small-molecule screening and , 2007 .

[127]  Masanori Arita,et al.  Comparison of ESI-MS Spectra in MassBank Database , 2008, 2008 International Conference on BioMedical Engineering and Informatics.

[128]  Alexander Hinneburg,et al.  An Evaluation of Text Retrieval Methods for Similarity Search of Multi-dimensional NMR-Spectra , 2007, BIRD.

[129]  P. Eilers,et al.  New background correction method for liquid chromatography with diode array detection, infrared spectroscopic detection and Raman spectroscopic detection. , 2004, Journal of chromatography. A.

[130]  David S. Wishart,et al.  HMDB 3.0—The Human Metabolome Database in 2013 , 2012, Nucleic Acids Res..

[131]  Rafael Brüschweiler,et al.  NMR in Metabolomics and Natural Products Research: Two Sides of the Same Coin , 2011, Accounts of chemical research.

[132]  Gajendra P. S. Raghava,et al.  NPACT: Naturally Occurring Plant-based Anti-cancer Compound-Activity-Target database , 2012, Nucleic Acids Res..

[133]  Sonia Mota,et al.  Identification of active compounds in vegetal extracts based on correlation between activity and HPLC-MS data. , 2013, Food chemistry.

[134]  Mohammed Al-Shalalfa,et al.  Prediction of novel drug indications using network driven biological data prioritization and integration , 2014, Journal of Cheminformatics.

[135]  W. Dietrich,et al.  Fast and precise automatic baseline correction of one- and two-dimensional nmr spectra , 1991 .

[136]  Roger G. Linington,et al.  Molecular networking as a dereplication strategy. , 2013, Journal of natural products.

[137]  Vladimir V Poroikov,et al.  Chemo- and bioinformatics resources for in silico drug discovery from medicinal plants beyond their traditional use: a critical review. , 2014, Natural product reports.

[138]  Steven Lai,et al.  MolFind: a software package enabling HPLC/MS-based identification of unknown chemical structures. , 2012, Analytical chemistry.

[139]  K. Laukens,et al.  Getting Your Peaks in Line: A Review of Alignment Methods for NMR Spectral Data , 2013, Metabolites.

[140]  Mathias Dunkel,et al.  Super Natural II—a database of natural products , 2014, Nucleic Acids Res..

[141]  S. Wold,et al.  Fuzzy clustering of 627 alcohols, guided by a strategy for cluster analysis of chemical compounds for combinatorial chemistry , 1998 .

[142]  D. Scott,et al.  Optimization and testing of mass spectral library search algorithms for compound identification , 1994, Journal of the American Society for Mass Spectrometry.

[143]  David S. Wishart,et al.  Quantitative metabolomics using NMR , 2008 .

[144]  Jürgen Bajorath,et al.  Chemical Database Mining through Entropy-Based Molecular Similarity Assessment of Randomly Generated Structural Fragment Populations , 2007, J. Chem. Inf. Model..

[145]  G. Barger,et al.  Stem cell studies of human malignant brain tumors. Part 1: Development of the stem cell assay and its potential. , 1983, Journal of neurosurgery.

[146]  P. Rasoanaivo,et al.  Natural Products and Drug Discovery through a Network of Partnerships , 2006 .

[147]  Wolfgang Sippl,et al.  ConMedNP: a natural product library from Central African medicinal plants for drug discovery , 2014 .

[148]  Lirong Chen,et al.  Use of Natural Products as Chemical Library for Drug Discovery and Network Pharmacology , 2013, PloS one.

[149]  Jan Luts,et al.  Effect of feature extraction for brain tumor classification based on short echo time 1H MR spectra , 2008, Magnetic resonance in medicine.

[150]  Elena Tsiporkova,et al.  NMR-based characterization of metabolic alterations in hypertension using an adaptive, intelligent binning algorithm. , 2008, Analytical chemistry.

[151]  Ilan Beer,et al.  Improving large‐scale proteomics by clustering of mass spectrometry data , 2004, Proteomics.

[152]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[153]  Xin Wen,et al.  BindingDB: a web-accessible database of experimentally determined protein–ligand binding affinities , 2006, Nucleic Acids Res..

[154]  C. Jaroniec,et al.  Nmrglue: an open source Python package for the analysis of multidimensional NMR data , 2013, Journal of biomolecular NMR.

[155]  R. Edrada-Ebel,et al.  Metabolomics and dereplication strategies in natural products. , 2013, Methods in molecular biology.

[156]  Maria De Iorio,et al.  BATMAN - an R package for the automated quantification of metabolites from nuclear magnetic resonance spectra using a Bayesian model , 2012, Bioinform..

[157]  Bradley S Moore,et al.  Lessons from the past and charting the future of marine natural products drug discovery and chemical biology. , 2012, Chemistry & biology.

[158]  Andreas Barth SpecInfo: An Integrated Spectroscopic Information System. , 1993 .

[159]  Duangdao Wichadakul,et al.  ChemEx: information extraction system for chemical data curation , 2012, BMC Bioinformatics.

[160]  Steven H. Bertz On the complexity of graphs and molecules , 1983 .

[161]  Christoph Steinbeck,et al.  Building blocks for automated elucidation of metabolites: Machine learning methods for NMR prediction , 2008, BMC Bioinformatics.

[162]  Rajarshi Guha,et al.  Chemical Informatics Functionality in R , 2007 .