Mining chemical information from open patents

Linked Open Data presents an opportunity to vastly improve the quality of science in all fields by increasing the availability and usability of the data upon which it is based. In the chemical field, there is a huge amount of information available in the published literature, the vast majority of which is not available in machine-understandable formats. PatentEye, a prototype system for the extraction and semantification of chemical reactions from the patent literature has been implemented and is discussed. A total of 4444 reactions were extracted from 667 patent documents that comprised 10 weeks' worth of publications from the European Patent Office (EPO), with a precision of 78% and recall of 64% with regards to determining the identity and amount of reactants employed and an accuracy of 92% with regards to product identification. NMR spectra reported as product characterisation data are additionally captured.

[1]  Suzanne L. Holcombe United States Patent and Trademark Office , 2008 .

[2]  Paul E. Blower,et al.  Extraction of chemical reaction information from primary journal text using computational linguistics techniques. 1. Lexical and syntactic phases , 1984, J. Chem. Inf. Comput. Sci..

[3]  Henry S. Rzepa,et al.  Chemical Markup, XML and the World-Wide Web. 2. Information Objects and the CMLDOM , 2001, J. Chem. Inf. Comput. Sci..

[4]  Henry S. Rzepa,et al.  The Next Big Thing: From Hypermedia to Datuments , 2004, J. Digit. Inf..

[5]  Joseph A Townsend Chemistry Add-In for Word , 2011 .

[6]  Egon L. Willighagen,et al.  OSCAR4: a flexible architecture for chemical text-mining , 2011, J. Cheminformatics.

[7]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the Worldwide Web. 1. Basic Principles , 1999, J. Chem. Inf. Comput. Sci..

[8]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the World Wide Web. 4. CML Schema , 2003, J. Chem. Inf. Comput. Sci..

[9]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[10]  Peter T. Corbett,et al.  Semantic enrichment of journal articles using chemical named entity recognition , 2007, ACL.

[11]  Peter Murray-Rust,et al.  High-Throughput Identification of Chemistry in Life Science Texts , 2006, CompLife.

[12]  Peter T. Corbett,et al.  Cascaded classifiers for confidence-based chemical named entity recognition , 2008, BMC Bioinformatics.

[13]  Ken E. Whelan,et al.  The Automation of Science , 2009, Science.

[14]  Joe R. McDaniel,et al.  Kekule: OCR-optical chemical (structure) recognition , 1992, J. Chem. Inf. Comput. Sci..

[15]  Paul E. Blower,et al.  Extraction of chemical reaction information from primary journal text using computational linguistics techniques. 2. Semantic phase , 1984, J. Chem. Inf. Comput. Sci..

[16]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[17]  Henry S. Rzepa,et al.  Chemical Markup, XML, and the World-Wide Web. 3. Toward a Signed Semantic Chemical Web of Trust , 2001, J. Chem. Inf. Comput. Sci..

[18]  Igor V. Filippov,et al.  Optical Structure Recognition Software To Recover Chemical Information: OSRA, An Open Source Solution , 2009, J. Chem. Inf. Model..

[19]  C.-S. Ai,et al.  Extraction of chemical reaction information from primary journal text , 1990, J. Chem. Inf. Comput. Sci..

[20]  Henry S. Rzepa,et al.  Hyper)activating the chemistry journal. , 2009 .

[21]  Peter Murray-Rust,et al.  ChemicalTagger: A tool for semantic text-mining in chemistry , 2011, J. Cheminformatics.

[22]  Henry S. Rzepa,et al.  The past, present and future of Scientific discourse , 2011, J. Cheminformatics.

[23]  Igor V. Filippov Improvements in Optical Structure Recognition Application , 2010 .

[24]  Carina Haupt Markush Structure Reconstruction - A Prototype for their Reconstruction from Image and Text into a Searchable, Context Sensitive Grammar based Extension of SMILES , 2010, Informatiktage.

[25]  Peter Murray-Rust,et al.  Chemical Name to Structure: OPSIN, an Open Source Solution , 2011, J. Chem. Inf. Model..

[26]  Egon L. Willighagen,et al.  Chemical Markup, XML, and the World Wide Web. 5. Applications of Chemical Metadata in RSS Aggregators , 2004, J. Chem. Inf. Model..