AsteriX: A Web Server To Automatically Extract Ligand Coordinates from Figures in PDF Articles

Coordinates describing the chemical structures of small molecules that are potential ligands for pharmaceutical targets are used at many stages of the drug design process. The coordinates of the vast majority of ligands can be obtained from either publicly accessible or commercial databases. However, interesting ligands sometimes are only available from the scientific literature, in which case their coordinates need to be reconstructed manually--a process that consists of a series of time-consuming steps. We present a Web server that helps reconstruct the three-dimensional (3D) coordinates of ligands for which a two-dimensional (2D) picture is available in a PDF file. The software, called AsteriX, analyses every picture contained in the PDF file and attempts to determine automatically whether or not it contains ligands. Areas in pictures that may contain molecular structures are processed to extract connectivity and atom type information that allow coordinates to be subsequently reconstructed. The AsteriX Web server was tested on a series of articles containing a large diversity in graphical representations. In total, 88% of 3249 ligand structures present in the test set were identified as chemical diagrams. Of these, about half were interpreted correctly as 3D structures, and a further one-third required only minor manual corrections. It is principally impossible to always correctly reconstruct 3D coordinates from pictures because there are many different protocols for drawing a 2D image of a ligand, but more importantly a wide variety of semantic annotations are possible. The AsteriX Web server therefore includes facilities that allow the users to augment partial or partially correct 3D reconstructions. All 3D reconstructions are submitted, checked, and corrected by the users domain at the server and are freely available for everybody. The coordinates of the reconstructed ligands are made available in a series of formats commonly used in drug design research. The AsteriX Web server is freely available at http://swift.cmbi.ru.nl/bitmapb/.

[1]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[2]  Arthur Dalby,et al.  Description of several chemical structure file formats used by computer programs developed at Molecular Design Limited , 1992, J. Chem. Inf. Comput. Sci..

[3]  A. Peter Johnson,et al.  Chemical literature data extraction: The CLiDE Project , 1993, J. Chem. Inf. Comput. Sci..

[4]  J. Gasteiger,et al.  FROM ATOMS AND BONDS TO THREE-DIMENSIONAL ATOMIC COORDINATES : AUTOMATIC MODEL BUILDERS , 1993 .

[5]  H. Matter,et al.  Structural classification of protein kinases using 3D molecular interaction field analysis of their ligand binding sites: target family landscapes. , 2002, Journal of medicinal chemistry.

[6]  Eric Martz,et al.  Protein Data Bank (PDB) , 2004 .

[7]  Andreas Bender,et al.  Similarity Searching of Chemical Databases Using Atom Environment Descriptors (MOLPRINT 2D): Evaluation of Performance , 2004, J. Chem. Inf. Model..

[8]  Ping Chen,et al.  Discovery of N-(2-chloro-6-methyl- phenyl)-2-(6-(4-(2-hydroxyethyl)- piperazin-1-yl)-2-methylpyrimidin-4- ylamino)thiazole-5-carboxamide (BMS-354825), a dual Src/Abl kinase inhibitor with potent antitumor activity in preclinical assays. , 2004, Journal of medicinal chemistry.

[9]  Samuel H. Wilson,et al.  Identification of Small Molecule Synthetic Inhibitors of DNA Polymerase β by NMR Chemical Shift Mapping* , 2004, Journal of Biological Chemistry.

[10]  G. Velasco,et al.  Kynurenamines as neural nitric oxide synthase inhibitors. , 2005, Journal of medicinal chemistry.

[11]  F. Berardi,et al.  Methyl substitution on the piperidine ring of N-[omega-(6-methoxynaphthalen-1-yl)alkyl] derivatives as a probe for selective binding and activity at the sigma(1) receptor. , 2005, Journal of medicinal chemistry.

[12]  D. Poirier,et al.  Estradiol-adenosine hybrid compounds designed to inhibit type 1 17beta-hydroxysteroid dehydrogenase. , 2005, Journal of medicinal chemistry.

[13]  M. Gütschow,et al.  Synthesis of tricyclic 1,3-oxazin-4-ones and kinetic analysis of cholesterol esterase and acetylcholinesterase inhibition. , 2005, Journal of medicinal chemistry.

[14]  K. Jacobson,et al.  Semi-rational design of (north)-methanocarba nucleosides as dual acting A(1) and A(3) adenosine receptor agonists: novel prototypes for cardioprotection. , 2005, Journal of medicinal chemistry.

[15]  E. Novellino,et al.  2-(Benzimidazol-2-yl)quinoxalines: a novel class of selective antagonists at human A(1) and A(3) adenosine receptors designed by 3D database searching. , 2005, Journal of medicinal chemistry.

[16]  J. Foekens,et al.  Small, potent, and selective diaryl phosphonate inhibitors for urokinase-type plasminogen activator with in vivo antimetastatic properties. , 2007, Journal of medicinal chemistry.

[17]  C. Kunick,et al.  2-(3-aryl-3-oxopropen-1-yl)-9-tert-butyl-paullones: a new antileishmanial chemotype. , 2008, Journal of medicinal chemistry.

[18]  James P. Cain,et al.  Structure-activity relationships of cyclic lactam analogues of alpha-melanocyte-stimulating hormone (alpha-MSH) targeting the human melanocortin-3 receptor. , 2008, Journal of medicinal chemistry.

[19]  A. Hudson,et al.  1-[(Imidazolidin-2-yl)imino]indazole. Highly alpha 2/I1 selective agonist: synthesis, X-ray structure, and biological activity. , 2008, Journal of medicinal chemistry.

[20]  B. Samuelsson,et al.  Two-carbon-elongated HIV-1 protease inhibitors with a tertiary-alcohol-containing transition-state mimic. , 2008, Journal of medicinal chemistry.

[21]  A. Detsi,et al.  Design and synthesis of potent antileishmanial cycloalkylidene-substituted ether phospholipid derivatives. , 2008, Journal of medicinal chemistry.

[22]  M. Youdim,et al.  A Novel iron-chelating derivative of the neuroprotective peptide NAPVSIPQ shows superior antioxidant and antineurodegenerative capabilities. , 2008, Journal of medicinal chemistry.

[23]  M. Brechbiel,et al.  Rational design and generation of a bimodal bifunctional ligand for antibody-targeted radiation cancer therapy. , 2008, Journal of medicinal chemistry.

[24]  Cristina R Ventura,et al.  Application of quantitative structure-activity relationships to the modeling of antitubercular compounds. 1. The hydrazide family. , 2008, Journal of medicinal chemistry.

[25]  G. Scapin,et al.  Discovery of potent and selective dipeptidyl peptidase IV inhibitors derived from beta-aminoamides bearing subsituted triazolopiperazines. , 2008, Journal of medicinal chemistry.

[26]  A. Peter Johnson,et al.  CLiDE Pro: The Latest Generation of CLiDE, a Tool for Optical Chemical Structure Recognition , 2009, J. Chem. Inf. Model..

[27]  Ewout W. Steyerberg,et al.  Evaluation of Performance , 2019, Statistics for Biology and Health.

[28]  Igor V. Filippov,et al.  Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution , 2009, J. Chem. Inf. Model..