OntoNotes: A Unified Relational Semantic Representation

The OntoNotes project is creating a corpus of large-scale, accurate, and integrated annotation of multiple levels of the shallow semantic structure in text. Such rich, integrated annotation covering many levels will allow for richer, cross-level models enabling significantly better automatic semantic analysis. At the same time, it demands a robust, efficient, scalable mechanism for storing and accessing these complex inter-dependent annotations. We describe a relational database representation that captures both the inter- and intra-layer dependencies and provide details of an object-oriented API for efficient, multi-tiered access to this data.

[1]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[2]  Albert Gough,et al.  High-Content Screening: A New Approach to Easing Key Bottlenecks in the Drug Discovery Process , 1997 .

[3]  Xiaobo Zhou,et al.  High content cellular imaging for drug development , 2006 .

[4]  Arif Ghafoor,et al.  Semantic Analysis of Biological Imaging Data: Challenges and Opportunities , 2007, Int. J. Semantic Comput..

[5]  Ralph Grishman,et al.  Covering Treebanks with GLARF , 2001, ACL 2001.

[6]  John B. Lowe,et al.  The Berkeley FrameNet Project , 1998, ACL.

[7]  James Pustejovsky,et al.  Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank and Coreference , 2005, FCA@ACL.

[8]  Jason R Swedlow,et al.  To 5D and Beyond: Quantitative Fluorescence Microscopy in the Postgenomic Era , 2002, Traffic.

[9]  Laurent Romary,et al.  International standard for a linguistic annotation framework , 2003, HLT-NAACL 2003.

[10]  Nancy Ide,et al.  International Standard for a Linguistic Annotation Framework , 2003, Natural Language Engineering.

[11]  Arif Ghafoor,et al.  Object-oriented conceptual modeling of video data , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[12]  Olga Babko-Malaya,et al.  Different Sense Granularities for Different Applications , 2004, HLT-NAACL 2004.

[13]  Jean-Christophe Olivo-Marin,et al.  On the digital trail of mobile cells , 2006, IEEE Signal Processing Magazine.

[14]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[15]  Xiaobo Zhou,et al.  Informatics challenges of high-throughput microscopy , 2006, IEEE Signal Processing Magazine.

[16]  John C Reed,et al.  Advances in molecular labeling, high throughput imaging and machine intelligence portend powerful functional cellular biochemistry tools , 2002, Journal of cellular biochemistry. Supplement.

[17]  Robert F Murphy,et al.  From quantitative microscopy to automated image understanding. , 2004, Journal of biomedical optics.

[18]  Charles J. Fillmore,et al.  The Structure of the Framenet Database , 2003 .

[19]  Patrick Pantel,et al.  The Omega Ontology , 2005, IJCNLP.

[20]  James F. Allen Maintaining knowledge about temporal intervals , 1983, CACM.

[21]  Arif Ghafoor,et al.  Quantitative Analysis of Inter-object Spatial Relationships in Biological Images , 2007, 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering.

[22]  Daniel Gildea,et al.  The Proposition Bank: An Annotated Corpus of Semantic Roles , 2005, CL.

[23]  Jia Liu,et al.  XML-Based Data Model and Architecture for a Knowledge-Based Grid-Enabled Problem-Solving Environment for High-Throughput Biological Imaging , 2008, IEEE Transactions on Information Technology in Biomedicine.

[24]  Seth Kulick,et al.  Issues in Synchronizing the English Treebank and PropBank , 2006 .

[25]  Thilo Götz,et al.  Design and implementation of the UIMA Common Analysis System , 2004, IBM Syst. J..

[26]  Katrin Erk,et al.  A Powerful and Versatile XML Format for Representing Role-semantic Annotation , 2004, LREC.