New Publicly Available Chemical Query Language, CSRML, To Support Chemotype Representations for Application to Data Mining and Modeling

Chemotypes are a new approach for representing molecules, chemical substructures and patterns, reaction rules, and reactions. Chemotypes are capable of integrating types of information beyond what is possible using current representation methods (e.g., SMARTS patterns) or reaction transformations (e.g., SMIRKS, reaction SMILES). Chemotypes are expressed in the XML-based Chemical Subgraphs and Reactions Markup Language (CSRML), and can be encoded not only with connectivity and topology but also with properties of atoms, bonds, electronic systems, or molecules. CSRML has been developed in parallel with a public set of chemotypes, i.e., the ToxPrint chemotypes, which are designed to provide excellent coverage of environmental, regulatory, and commercial-use chemical space, as well as to represent chemical patterns and properties especially relevant to various toxicity concerns. A software application, ChemoTyper has also been developed and made publicly available in order to enable chemotype searching and fingerprinting against a target structure set. The public ChemoTyper houses the ToxPrint chemotype CSRML dictionary, as well as reference implementation so that the query specifications may be adopted by other chemical structure knowledge systems. The full specifications of the XML-based CSRML standard used to express chemotypes are publicly available to facilitate and encourage the exchange of structural knowledge.

[1]  Matthias Rarey,et al.  Interactive design of generic chemical patterns. , 2013, Drug discovery today.

[2]  Jörg K. Wegner,et al.  Molecular Query Language (MQL)A Context-Free Grammar for Substructure Matching , 2007, J. Chem. Inf. Model..

[3]  R. Tennant,et al.  Chemical structure, Salmonella mutagenicity and extent of carcinogenicity as indicators of genotoxic carcinogenesis among 222 chemicals tested in rodents by the U.S. NCI/NTP. , 1988, Mutation research.

[4]  Igor V. Tetko,et al.  ToxAlerts: A Web Server of Structural Alerts for Toxic Chemicals and Compounds with Potential Adverse Reactions , 2012, J. Chem. Inf. Model..

[5]  David Weininger,et al.  SMILES. 2. Algorithm for generation of unique SMILES notation , 1989, J. Chem. Inf. Comput. Sci..

[6]  N. Nikolova,et al.  Rule interpreter: a chemical language for structure-based screening , 2003 .

[7]  Robert J Kavlock,et al.  Predictive models of prenatal developmental toxicity from ToxCast high-throughput screening data. , 2011, Toxicological sciences : an official journal of the Society of Toxicology.

[8]  D. Sanderson,et al.  Computer Prediction of Possible Toxic Action from Chemical Structure; The DEREK System , 1991, Human & experimental toxicology.

[9]  Ann M Richard,et al.  A novel approach: chemical relational databases, and the role of the ISSCAN database on assessing chemical carcinogenicity. , 2008, Annali dell'Istituto superiore di sanita.

[10]  C. Austin,et al.  Improving the Human Hazard Characterization of Chemicals: A Tox21 Update , 2013, Environmental health perspectives.

[11]  Michel Dumontier,et al.  CO: A chemical ontology for identification of functional groups and semantic comparison of small molecules , 2005, FEBS letters.

[12]  R. J. Gillespie,et al.  The valence-shell electron-pair repulsion (VSEPR) theory of directed valency , 1963 .

[13]  Harish Dureja,et al.  Classification models for safe drug molecules. , 2013, Methods in molecular biology.

[14]  Robert D. Clark,et al.  SYBYL Line Notation (SLN): A Single Notation To Represent Chemical Structures, Queries, Reactions, and Virtual Libraries , 2008, J. Chem. Inf. Model..

[15]  R D Benz,et al.  Priority-based assessment of food additives database of the U.S. Food and Drug Administration Center for Food Safety and Applied Nutrition. , 1991, Environmental health perspectives.

[16]  M. Randic Aromaticity and conjugation , 1977 .

[17]  G. Rishton Nonleadlikeness and leadlikeness in biochemical screening. , 2003, Drug discovery today.

[18]  David M. Reif,et al.  Update on EPA's ToxCast program: providing high throughput decision support tools for chemical risk management. , 2012, Chemical research in toxicology.

[19]  Peter Murray-Rust,et al.  A universal approach to web-based chemistry using XML and CML , 2000 .

[20]  Chihae Yang,et al.  Novel technologies and an overall strategy to allow hazard assessment and risk prediction of chemicals, cosmetics, and drugs with animal-free methods. , 2012, ALTEX.

[21]  A G Renwick,et al.  Structure-based thresholds of toxicological concern--guidance for application to substances present at low levels in the diet. , 2005, Toxicology and applied pharmacology.

[22]  Chihae Yang,et al.  Building predictive models for protein tyrosine phosphatase 1B inhibitors based on discriminating structural features by reassembling medicinal chemistry building blocks. , 2004, Journal of medicinal chemistry.

[23]  Gilles Klopman,et al.  Optimizing Predictive Performance of CASE Ultra Expert System Models Using the Applicability Domains of Individual Toxicity Alerts , 2012, J. Chem. Inf. Model..