Language Resources and Chemical Informatics

Chemistry research papers are a primary source of information about chemistry, as in any scientific field. The presentation of the data is, predominantly, unstructured information, and so not immediately susceptible to processes developed within chemical informatics for carrying out chemistry research by information processing techniques. At one level, extracting the relevant information from research papers is a text mining task, requiring both extensive language resources and specialised knowledge of the subject domain. However, the papers also encode information about the way the research is conducted and the structure of the field itself. Applying language technology to research papers in chemistry can facilitate eScience on several different levels. The SciBorg project sets out to provide an extensive, analysed corpus of published chemistry research. This relies on the cooperation of several journal publishers to provide papers in an appropriate form. The work is carried out as a collaboration involving the Computer Laboratory, Chemistry Department and eScience Centre at Cambridge University, and is funded under the UK eScience programme.

[1]  Simone Teufel,et al.  Annotation of Chemical Named Entities , 2007, BioNLP@ACL.

[2]  Simone Teufel,et al.  Flexible Interfaces in the Application of Language Technology to an eScience Corpus , 2006 .

[3]  Simone Teufel,et al.  Argumentative zoning information extraction from scientific text , 1999 .

[4]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[5]  Simone Teufel,et al.  Automatic classification of citation function , 2006, EMNLP.

[6]  Aaron N. Kaplan,et al.  Discovering Paradigm Shift Patterns in Biomedical Abstracts: Application to Neurodegenerative Diseases , 2005 .

[7]  Simone Teufel,et al.  Whose Idea Was This, and Why Does it Matter? Attributing Scientific Work to Citations , 2007, HLT-NAACL.

[8]  Dan Flickinger,et al.  On building a more effcient grammar by exploiting types , 2000, Natural Language Engineering.

[9]  Stephan Oepen,et al.  Collaborative language engineering : a case study in efficient grammar-based processing , 2002 .

[10]  FlickingerDan On building a more efficient grammar by exploiting types , 2000 .

[11]  Marc Moens,et al.  Articles Summarizing Scientific Articles: Experiments with Relevance and Rhetorical Status , 2002, CL.

[12]  Ann A. Copestake,et al.  A Standoff Annotation Interface between DELPH-IN Components , 2006, NLPXML@EACL.

[13]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[14]  Peter Murray-Rust,et al.  High-Throughput Identification of Chemistry in Life Science Texts , 2006, CompLife.

[15]  Peter T. Corbett,et al.  Semantic enrichment of journal articles using chemical named entity recognition , 2007, ACL.