Interoperability of Corpora and Annotations

This paper describes the application of OWL and RDF to address the interoperability of linguistic corpora and linguistic annotations within such corpora. Interoperability of linguistic corpora involves two aspects: Structural interoperability (annotations of different origin are represented using the same formalism) and conceptual interoperability (annotations of different origin are linked to a common vocabulary).

[1]  Asunción Gómez-Pérez,et al.  OntoTag's linguistic ontologies: improving semantic Web annotations for a better language understanding in machines , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[2]  Antonio Pareja-Lora,et al.  OntoTag’s linguistic ontologies as a reference for semantic web annotations , 2004 .

[3]  Christian Chiarcos,et al.  A Flexible Framework for Integrating Annotations from Different Tools and Tagsets , 2008 .

[4]  Marc Kemps-Snijders,et al.  ISOcat: remodelling metadata for language resources , 2009, Int. J. Metadata Semant. Ontologies.

[5]  Andreas Witt,et al.  E-MELD 2006 Workshop on Digital Language Documentation: Tools and Standards - The State of the Art Avoiding Data Graveyards: From Heterogeneous Data Collected in Multiple Research Projects to Sustainable Linguistic Resources , 2006 .

[6]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[7]  Geoffrey Leech,et al.  EAGLES recommendations for the morphosyntactic annotation of corpora , 1996 .

[8]  Menzo Windhouwer,et al.  Linking to Linguistic Data Categories in ISOcat , 2012, Linked Data in Linguistics.

[9]  Christian Chiarcos,et al.  The TIGER Corpus Navigator , 2010 .

[10]  Jens Lehmann,et al.  The German DBpedia: A Sense Repository for Linking Entities , 2012, Linked Data in Linguistics.

[11]  Christian Chiarcos,et al.  An OWL-and XQuery-based mechanism for the retrieval of linguistic patterns from XML-corpora , 2007 .

[12]  Christian Chiarcos,et al.  OWL/DL formalization of the MULTEXT-East morphosyntactic specifications , 2011, Linguistic Annotation Workshop.

[13]  Wojciech Skut,et al.  A Linguistically Interpreted Corpus of German Newspaper Text , 1998, LREC.

[14]  Laurent Romary,et al.  [tiger2/]- Serialising the ISO SynAF Syntactic Object Model , 2011, ArXiv.

[15]  Christian Chiarcos,et al.  By all these lovely tokens... Merging conflicting tokenizations , 2009, Lang. Resour. Evaluation.

[16]  Manfred Stede,et al.  SUMMaR: Combining Linguistics and Statistics for Text Summarization , 2006, ECAI.

[17]  Thomas Schmidt EXMARaLDA - ein System zur computergestützten Diskurstranskription , 2004 .

[18]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[19]  Tomaz Erjavec,et al.  MULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora , 2004, LREC.

[20]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[21]  Antonio Pareja-Lora,et al.  Ontology-based Interoperation of Linguistic Tools for an Improved Lemma Annotation in Spanish , 2010, LREC.

[22]  Michael ODonnell,et al.  RSTTool 2.4 - A markup Tool for Rhetorical Structure Theory , 2000, INLG.

[23]  Antonio Pareja-Lora,et al.  OntoLingAnnot's Ontologies: Facilitating Interoperable Linguistic Annotations (Up to the Pragmatic Level) , 2012, Linked Data in Linguistics.

[24]  Nancy Ide,et al.  International Standard for a Linguistic Annotation Framework , 2003, Natural Language Engineering.

[25]  Ulf Leser,et al.  Fast and practical indexing and querying of very large graphs , 2007, SIGMOD '07.

[26]  Christian Chiarcos AN ONTOLOGY OF LINGUISTIC ANNOTATION : WORD CLASSES AND MORPHOLOGY , 2007 .

[27]  Laurent Romary,et al.  A model oriented approach to the mapping of annotation formats using standards , 2010 .

[28]  Wojciech Skut,et al.  An Annotation Scheme for Free Word Order Languages , 1997, ANLP.

[29]  Jean Carletta,et al.  The NITE XML Toolkit: Data Model and Query Language , 2005, Lang. Resour. Evaluation.

[30]  Martha Palmer,et al.  From TreeBank to PropBank , 2002, LREC.

[31]  Nancy Ide,et al.  What Does Interoperability Mean , Anyway ? Toward an Operational Definition of Interoperability for Language Technology , 2010 .

[32]  Scott Farrar,et al.  A linguistic ontology for the semantic web , 2003 .

[33]  Christian Chiarcos,et al.  Ontology-Based Interface Specifications for a NLP Pipeline Architecture , 2008, LREC.

[34]  Rob Goedemans,et al.  Distributed tasking in ontology mediated integration of typological databases for linguistic research , 2005 .

[35]  Wolfgang Lezius,et al.  A Description Language for Syntactically Annotated Corpora , 2000, COLING.

[36]  Kerstin Eckart,et al.  A Discourse Information Radio News Database for Linguistic Analysis , 2012, Linked Data in Linguistics.

[37]  Kemps-SnijdersMarc,et al.  ISOcat: remodelling metadata for language resources , 2009 .

[38]  Michael Cysouw,et al.  Treating Dictionaries as a Linked-Data Corpus , 2012, Linked Data in Linguistics.

[39]  Christian Chiarcos,et al.  ANNIS: A Search Tool for Multi-Layer Annotated Corpora , 2009 .

[40]  Ulrich Heid,et al.  Formalising Multi-layer Corpora in OWL DL - Lexicon Modelling, Querying and Consistency Control , 2008, IJCNLP.

[41]  Tomaž Erjavec,et al.  MULTEXT-East Version 4: Multilingual Morphosyntactic Specifications, Lexicons and Corpora , 2010, LREC 2010.

[42]  Sebastian Nordhoff Linked Data for Linguistic Diversity Research: Glottolog/Langdoc and ASJP Online , 2012, Linked Data in Linguistics.

[43]  Christian Chiarcos An ontology of linguistic annotations , 2008, LDV Forum.

[44]  Philipp Cimiano,et al.  Integrating WordNet and Wiktionary with lemon , 2012, Linked Data in Linguistics.

[45]  Stefanie Dipper,et al.  XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation , 2005, Berliner XML Tage.

[46]  Peter Wittenburg,et al.  Autotypologizing Databases and their Use in Fieldwork , 2002 .

[47]  Michael Schiehlen,et al.  Optimizing Algorithms for Pronoun Resolution , 2004, COLING.

[48]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[49]  Nancy Ide,et al.  GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[50]  Andrea Giovanni Nuzzolese,et al.  Gathering lexical linked data and knowledge patterns from FrameNet , 2011, K-CAP '11.

[51]  Christian Chiarcos Towards Robust Multi-Tool Tagging. An OWL/DL-Based Approach , 2010, ACL.

[52]  Christoph Müller,et al.  Multi-level annotation of linguistic data with MMAX 2 , 2006 .

[53]  James Pustejovsky,et al.  Merging PropBank, NomBank, TimeBank, Penn Discourse Treebank and Coreference , 2005, FCA@ACL.

[54]  Jost Gippert,et al.  RELISH: RENDERING ENDANGERED LANGUAGES LEXICONS INTEROPERABLE THROUGH STANDARDS HARMONIZATION , 2012 .

[55]  Philipp Cimiano,et al.  Linking Lexical Resources and Ontologies on the Semantic Web with Lemon , 2011, ESWC.