Extraction of Information from the Text of Chemical Patents. 1. Identification of Specific Chemical Names

Much attention has been paid to translating isolated chemical names into forms such as connection tables, but less effort has been expended in identifying substance names in running text to make them available for processing. The requirement for automatic name identification becomes a more urgent priority today, not the least in light of the inherent importance of patents and the increasing complexity of newly synthesized substances and, with these, the need for error-free processing of information from patent and other documents. The elaboration of a methodology for isolating substance names in the text of English-language patents is described here, using, in part, the SGML (Standard Generalized Markup Language) of the patent text as an aid to this process. Evaluation of the procedures, which are still at an early stage of development, demonstrates that even simple methods can achieve very high degrees of success.

[1]  D. I. Cooke-Fox,et al.  Computer translation of IUPAC systematic organic chemical nomenclature. 1. Introduction and background to a grammar-based approach , 1989, J. Chem. Inf. Comput. Sci..

[2]  Guenter Poetzscher,et al.  User needs in chemical information , 1990, J. Chem. Inf. Comput. Sci..

[3]  D. I. Cooke-Fox,et al.  Computer translation of IUPAC systematic organic chemical nomenclature. 5. Steroid nomenclature , 1990, J. Chem. Inf. Comput. Sci..

[4]  Wendy G. Lehnert,et al.  Information extraction , 1996, CACM.

[5]  Mark Sanderson,et al.  Conceptual Information Retrieval – A Case Study in Adaptive Partial Parsing , 1992 .

[6]  A. Peter Johnson,et al.  Chemical literature data extraction: The CLiDE Project , 1993, J. Chem. Inf. Comput. Sci..

[7]  Eugene Garfield,et al.  An Algorithm for Translating Chemical Names to Molecular Formulas. , 1962 .

[8]  Paul E. Blower,et al.  Extraction of chemical reaction information from primary journal text using computational linguistics techniques. 2. Semantic phase , 1984, J. Chem. Inf. Comput. Sci..

[9]  Gobinda G. Chowdhury,et al.  Automatic interpretation of the texts of chemical patent abstracts. 1. Lexical analysis and categorization , 1992, J. Chem. Inf. Comput. Sci..

[10]  A. Peter Johnson,et al.  Recent Advances in the CLiDE Project: Logical Layout Analysis of Chemical Documents , 1997, J. Chem. Inf. Comput. Sci..

[11]  N. B. Gove,et al.  The Data Compilation as Part of the Information Cycle. , 1962 .

[12]  Gobinda G. Chowdhury,et al.  Automatic extraction of citations from the text of English-language patents - an example of template mining , 1996, J. Inf. Sci..

[13]  時實 象一 Computer storage and retrieval of generic chemical structures , 1987 .

[14]  D. I. Cooke-Fox,et al.  Computer translation of IUPAC systematic organic chemical nomenclature. 3. Syntax analysis and semantic processing , 1989, J. Chem. Inf. Comput. Sci..

[15]  David F. Ilten DETHERM: Thermophysical property data for the optimization of heat-transfer equipment , 1991, J. Chem. Inf. Comput. Sci..

[16]  Gobinda G. Chowdhury,et al.  Automatic interpretation of the texts of chemical patent abstracts. 2. Processing and results , 1992, J. Chem. Inf. Comput. Sci..

[17]  James E. Rush,et al.  Procedures for Converting Systematic Names of Organic Compounds into Atom-Bond Connection Tables , 1967 .

[18]  C.-S. Ai,et al.  Extraction of chemical reaction information from primary journal text , 1990, J. Chem. Inf. Comput. Sci..

[19]  Yorick Wilks,et al.  Evaluation of an Algorithm for the Recognition and Classification of Proper Names , 1996, COLING.

[20]  K. W. RAYMOND A LISP program for the generation of IUPAC names from chemical structures , 1991, J. Chem. Inf. Comput. Sci..

[21]  Gordon H. Wood,et al.  Canadian Scientific Numeric Database Service , 1989, J. Chem. Inf. Comput. Sci..

[22]  Janusz L Wisniewski AUTONOM — A Chemist’s Dream: System for (Micro)Computer Generation of IUPAC-Compatible Names from Structural Input , 1993 .

[23]  D. I. Cooke-Fox,et al.  Computer translation of IUPAC systematic organic chemical nomenclature. 2. Development of a formal grammar , 1989, J. Chem. Inf. Comput. Sci..

[24]  D. I. Cooke-Fox,et al.  Computer Translation of IUPAC Systematic Organic Chemical Nomenclature. Part 4. Concise Connection Tables to Structure Diagrams. , 1990 .

[25]  Alan F. Smeaton,et al.  Progress in the Application of Natural Language Processing to Information Retrieval Tasks , 1992, Comput. J..

[26]  Paul E. Blower,et al.  Extraction of chemical reaction information from primary journal text using computational linguistics techniques. 1. Lexical and syntactic phases , 1984, J. Chem. Inf. Comput. Sci..