Annotating Event Chains for Carbon Sequestration Literature

In this paper we present a project of annotating event chains for an important scientific domain ― carbon sequestration. This domain aims to reduce carbon emissions and has been identified by the U.S. National Academy of Engineering (NAE) as a grand challenge problem for the 21st century. Given a collection of scientific literature, we identify a set of centroid experiments; and then link and order the observations and events centered around these experiments on temporal or causal chains. We describe the fundamental challenges on annotations and our general solutions to address them. We expect that our annotation efforts will produce significant advances in inter-operability through new information extraction techniques and permit scientists to build knowledge that will provide better understanding of important scientific challenges in this domain, share and re-use of diverse data sets and experimental results in a more efficient manner. In addition, the annotations of metadata and ontology for these literature will provide important support for data lifecycle activities.

[1]  Dekang Lin,et al.  Phrase Clustering for Discriminative Learning , 2009, ACL.

[2]  Gordon Ellis,et al.  Grand challenges for engineering , 2009, IEEE Engineering Management Review.

[3]  Patrick Pantel,et al.  Automatically Discovering Word Senses , 2003, NAACL.

[4]  Neville Ryant,et al.  A Large-scale Classication of English Verbs , 2006 .

[5]  Nianwen Xue,et al.  Adding semantic roles to the Chinese Treebank , 2009, Natural Language Engineering.

[6]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[7]  Scott Miller,et al.  Name Tagging with Word Clusters and Discriminative Training , 2004, NAACL.

[8]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[9]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[10]  Ralph Grishman,et al.  Adaptive Information Extraction and Sublanguage Analysis , 2001 .

[11]  Ann Zimmerman,et al.  DataNet: An emerging cyberinfrastructure for sharing, reusing and preserving digital data for scientific discovery and learning , 2009 .

[12]  J C Olsen,et al.  THE AMERICAN INSTITUTE OF CHEMICAL ENGINEERS. , 1912, Science.

[13]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[14]  Jari Björne,et al.  BioInfer: a corpus for information extraction in the biomedical domain , 2007, BMC Bioinformatics.

[16]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[17]  E. Peltzer,et al.  Direct experiments on the ocean disposal of fossil fuel CO2 , 1999, Science.

[18]  Heng Ji,et al.  Cross-document Event Extraction and Tracking: Task, Evaluation, Techniques and Challenges , 2009, RANLP.