Open-Source Tools, Techniques, and Data in Chemoinformatics

Chemicals are everywhere and they are essentially composed of atoms and bonds that support life and provide comfort. The numerous combinations of these entities lead to the complexity and diversity in the universe. Chemistry is a subject which analyzes and tries to explain this complexity at the atomic level. Advancement in this subject led to more data generation and information explosion. Over a period of time, the observations were recorded in chemical documents that include journals, patents, and research reports. The vast amount of chemical literature covering more than two centuries demands the extensive use of information technology to manage it. Today, the chemoinformatics tools and methods have grown powerful enough to handle and discover unexplored knowledge from this huge resource of chemical information. The role of chemoinformatics is to add value to every bit of chemical data. The underlying theme of this domain is how to develop efficient chemical with predicted physico-chemical and biological properties for economic, social, health, safety, and environment. In this chapter, we begin with a brief definition and role of open-source tools in chemoinformatics and extend the discussion on the need for basic computer knowledge required to understand this specialized and interdisciplinary subject. This is followed by an in-depth analysis of traditional and advanced methods for handling chemical structures in computers which is an elementary but essential precursor for performing any chemoinformatics task. Practical guidance on step-by-step use of open-source, free, academic, and commercial structure representation tools is also provided. To gain a better understanding, it is highly recommended that the reader attempts the practice tutorials, Do it yourself exercises, and questions given in each chapter. The scope of this chapter is designed for experimental chemists, biologists, mathematicians, physicists, computer scientists, etc. to understand the subject in a practical way with relevant and easy-to-understand examples and also to encourage the readers to proceed further with advanced topics in the subsequent chapters.

[1]  Igor V. Filippov,et al.  Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution , 2009, J. Chem. Inf. Model..

[2]  Fabrizio Costa,et al.  Molecular Graph Augmentation with Rings and Functional Groups , 2010, J. Chem. Inf. Model..

[3]  Jürgen Bajorath,et al.  Advanced fingerprint methods for similarity searching: balancing molecular complexity effects. , 2010, Combinatorial chemistry & high throughput screening.

[4]  D. J. Gluck,et al.  A Chemical Structure Storage and Search System Developed at Du Pont. , 1965 .

[5]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[6]  K. Humbel,et al.  Chemical Applications of Topology and Graph Theory, R.B. King (Ed.). Elsevier Science Publishers, Amsterdam (1983), (ISBN 0-444-42244-7). XII + 494 p. Price Dfl. 275.00 , 1985 .

[7]  Jennifer Widom,et al.  A First Course in Database Systems , 1997 .

[8]  Joshua Zhexue Huang,et al.  Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values , 1998, Data Mining and Knowledge Discovery.

[9]  Renu Vyas,et al.  Chemical structure representations and applications in computational toxicity. , 2012, Methods in molecular biology.

[10]  R. B. King,et al.  Chemical applications of topology and group theory , 1984 .

[11]  Fan Li Developing Chemical Information Systems: An Object-Oriented Approach Using Enterprise Java , 2006 .

[12]  S. Krishnan,et al.  Hash Functions for Rapid Storage and Retrieval of Chemical Structures , 1978, J. Chem. Inf. Comput. Sci..

[13]  Peter Ertl,et al.  Molecular structure input on the web , 2010, J. Cheminformatics.

[14]  Wendy A. Warr,et al.  Representation of chemical structures , 2011 .

[15]  Peter Ertl,et al.  JSME: a free molecule editor in JavaScript , 2013, Journal of Cheminformatics.

[16]  Alok J. Saldanha,et al.  Java Treeview - extensible visualization of microarray data , 2004, Bioinform..

[17]  Charles E. Granito,et al.  Rapid Structure Searches via Permuted Chemical Line-Notations. , 1964 .

[19]  Xin Chen,et al.  Performance of Similarity Measures in 2D Fragment-Based Similarity Searching: Comparison of Structural Descriptors and Similarity Coefficients , 2002, J. Chem. Inf. Comput. Sci..

[20]  Peter Murray-Rust,et al.  The semantics of Chemical Markup Language (CML) for computational chemistry : CompChem , 2012, Journal of Cheminformatics.

[21]  Lois E. Fritts,et al.  Using the Wiswesser line notation (WLN) for online, interactive searching of chemical structures , 1982, Journal of chemical information and computer sciences.

[22]  J. Gasteiger,et al.  Chemoinformatics: A Textbook , 2003 .

[23]  Moreno Muffatto,et al.  Open source : a multidisciplinary approach , 2006 .

[24]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[25]  Amitabh Varshney,et al.  High-throughput sequence alignment using Graphics Processing Units , 2007, BMC Bioinformatics.

[26]  Satoru Miyano,et al.  Open source clustering software , 2004 .

[27]  Egon L. Willighagen,et al.  The Chemistry Development Kit (CDK): An Open-Source Java Library for Chemo-and Bioinformatics , 2003, J. Chem. Inf. Comput. Sci..

[28]  R. Brereton,et al.  Handbook of chemoinformatics: from data to knowledge, edited by Johann Gasteiger, Volumes 1–4. Wiley‐VCH, Weinheim, 2003, ISBN 3527306803, €485 , 2004 .

[29]  R. Webster Homer,et al.  SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation , 1997, J. Chem. Inf. Comput. Sci..

[30]  Andrew R. Leach,et al.  An Introduction to Chemoinformatics , 2003 .

[31]  Roger A. Sayle,et al.  Efficient maximum common subgraph (MCS) searching of large chemical databases , 2013, Journal of Cheminformatics.

[32]  A. Peter Johnson,et al.  CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition , 2009, J. Chem. Inf. Model..

[33]  Muthukumarasamy Karthikeyan,et al.  Encoding and Decoding Graphical Chemical Structures as Two-Dimensional (PDF417) Barcodes , 2005, J. Chem. Inf. Model..

[34]  Chris Morley,et al.  Open Babel: An open chemical toolbox , 2011, J. Cheminformatics.

[35]  James M. Ortega An Introduction to Fortran 90 for Scientific Computing , 1994 .

[36]  Muthukumarasamy Karthikeyan,et al.  Harvesting Chemical Information from the Internet Using a Distributed Approach: ChemXtreme , 2006, J. Chem. Inf. Model..

[37]  Peter Willett,et al.  Similarity searching in files of three-dimensional chemical structures: Comparison of fragment-based measures of shape similarity , 1994, J. Chem. Inf. Comput. Sci..

[38]  Christoph Steinbeck,et al.  JChemPaint - Using the collaborative forces of the Internet to develop a free editor for 2D chemical structures , 2000 .

[39]  Jürgen Bajorath,et al.  Similarity Searching for Potent Compounds Using Feature Selection , 2013, J. Chem. Inf. Model..

[40]  Muthukumarasamy Karthikeyan,et al.  Distributed Chemical Computing Using ChemStar: An Open Source Java Remote Method Invocation Architecture Applied to Large Scale Molecular Data from PubChem , 2008, J. Chem. Inf. Model..

[41]  Michael F. Lynch,et al.  Computer storage and retrieval of generic chemical structures in patents, 2. GENSAL, a formal language for the description of generic chemical structures , 1981, J. Chem. Inf. Comput. Sci..

[42]  Yoshimasa Takahashi,et al.  Automatic identification of molecular similarity using reduced-graph representation of chemical structure , 1992, J. Chem. Inf. Comput. Sci..

[43]  Joo Chuan Tong,et al.  CLEVER: pipeline for designing in silico chemical libraries. , 2009, Journal of molecular graphics & modelling.

[44]  Exploring the Milky Way of molecular diversity: Combinatorial chemistry and molecular diversity , 2007 .

[45]  Naomie Salim,et al.  Analysis and Display of the Size Dependence of Chemical Similarity Coefficients , 2003, J. Chem. Inf. Comput. Sci..

[46]  David Weininger,et al.  SMILES, 3. DEPICT. Graphical depiction of chemical structures , 1990, J. Chem. Inf. Comput. Sci..