Chemical Informatics Functionality in R

The flexibility and scope of the R programming environment has made it a popular choice for statistical modeling and scientific prototyping in a number of fields. In the field of chemistry, R provides several tools for a variety of problems related to statistical modeling of chemical information. However, one aspect common to these tools is that they do not have direct access to the information that is available from chemical structures, such as contained in molecular descriptors. We describe the rcdk package that provides the R user with access to the CDK, a Java framework for cheminformatics. As a result, it is possible to read in a variety of molecular formats, calculate molecular descriptors and evaluate fingerprints. In addition, we describe the rpubchem that will allow access to the data in PubChem, a public repository of molecular structures and associated assay data for approximately 8 million compounds. Currently, the package allows access to structural information as well as some simple molecular properties from PubChem. In addition the package allows access to bio-assay data from the PubChem FTP servers.

[1]  David Vidal,et al.  LINGO, an Efficient Holographic Text Based Method To Calculate Biophysical Properties and Intermolecular Similarities , 2005, J. Chem. Inf. Model..

[2]  P. Selzer,et al.  Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. , 2000, Journal of medicinal chemistry.

[3]  Luhua Lai,et al.  A New Atom-Additive Method for Calculating Partition Coefficients , 1997, J. Chem. Inf. Comput. Sci..

[4]  Peter C. Jurs,et al.  Prediction of the Normal Boiling Points of Organic Compounds from Molecular Structures with a Computational Neural Network Model , 1999, J. Chem. Inf. Comput. Sci..

[5]  C. Steinbeck,et al.  Recent developments of the chemistry development kit (CDK) - an open-source java library for chemo- and bioinformatics. , 2006, Current pharmaceutical design.

[6]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[7]  Ting Chen,et al.  R-NN Curves: An Intuitive Approach to Outlier Detection Using a Distance Based Method , 2006, J. Chem. Inf. Model..

[8]  Brian K. Shoichet,et al.  ZINC - A Free Database of Commercially Available Compounds for Virtual Screening , 2005, J. Chem. Inf. Model..

[9]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[10]  Roberto Todeschini,et al.  Handbook of Molecular Descriptors , 2002 .

[11]  Trevor Rowe,et al.  Web-accessible scientific workflow system for performance monitoring. , 2006, Environmental science & technology.

[12]  Christoph Steinbeck,et al.  JChemPaint - Using the collaborative forces of the Internet to develop a free editor for 2D chemical structures , 2000 .

[13]  John M. Barnard,et al.  Chemical Similarity Searching , 1998, J. Chem. Inf. Comput. Sci..

[14]  Jeffrey J. Sutherland,et al.  Spline-Fitting with a Genetic Algorithm: A Method for Developing Classification Structure-Activity Relationships , 2003, J. Chem. Inf. Comput. Sci..