Ontology-Based XQuery’ing of XML-Encoded Language Resources on Multiple Annotation Layers

We present an approach for querying collections of heterogeneous linguistic corpora that are annotated on multiple layers using arbitrary XML-based markup languages. An OWL ontology provides a homogenising view on the conceptually different markup languages so that a common querying framework can be established using the method of ontology-based query expansion. In addition, we present a highly flexible web-based graphical interface that can be used to query corpora with regard to several different linguistic properties such as, for example, syntactic tree fragments. This interface can also be used for ontology-based querying of multiple corpora simultaneously.

[1]  Frederick B. Thompson,et al.  English for the computer , 1899, AFIPS '66 (Fall).

[2]  Andreas Witt,et al.  On the Lossless Transformation of Single-File, Multi-Layer Annotations into Multi-Rooted Trees , 2007 .

[3]  Asunción Gómez-Pérez,et al.  OntoTag's linguistic ontologies: improving semantic Web annotations for a better language understanding in machines , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[4]  Susan B. Davidson,et al.  Designing and Evaluating an XPath Dialect for Linguistic Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[5]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[6]  Scott Farrar,et al.  A linguistic ontology for the semantic web , 2003 .

[7]  Andreas Witt,et al.  The Metadata-Database of a Next Generation Sustainability Web-Platform for Language Resources , 2008, LREC.

[8]  Andreas Witt,et al.  Sustainability of Linguistic Resources , 2006 .

[9]  Wolfgang Lezius,et al.  TIGER: Linguistic Interpretation of a German Corpus , 2004 .

[10]  Arjen P. de Vries,et al.  Efficient XQuery Support for Stand-Off Annotation , 2006, XIME-P.

[11]  Christian Chiarcos,et al.  An OWL-and XQuery-based mechanism for the retrieval of linguistic patterns from XML-corpora , 2007 .

[12]  Nancy Ide,et al.  XCES: An XML-based Encoding Standard for Linguistic Corpora , 2000, LREC.

[13]  Christian Chiarcos AN ONTOLOGY OF LINGUISTIC ANNOTATION : WORD CLASSES AND MORPHOLOGY , 2007 .

[14]  Erhard W. Hinrichs,et al.  The Tüba-D/Z Treebank: Annotating German with a Context-Free Backbone , 2004, LREC.

[15]  Geoffrey Sampson,et al.  English for the Computer: The SUSANNE Corpus and Analytic Scheme , 1995, Computational Linguistics.

[16]  Stefanie Dipper,et al.  Representing and Querying Standoff XML , 2006 .

[17]  Stavros Skopeteas,et al.  Information Structure in Cross-Linguistic Corpora: , 2007 .

[18]  Jean Carletta,et al.  The NITE Object Model Library for Handling Structured Linguistic Annotation on Multimodal Data Sets , 2002 .

[19]  Richard Eckart,et al.  An XML-based data model for flexible representation and query of linguistically interpreted corpora , 2007 .

[20]  Andreas Witt,et al.  Modelling Linguistic Data Structures , 2006 .

[21]  C. M. Sperberg-McQueen,et al.  Guidelines for electronic text encoding and interchange , 1994 .

[22]  Andreas Witt,et al.  E-MELD 2006 Workshop on Digital Language Documentation: Tools and Standards - The State of the Art Avoiding Data Graveyards: From Heterogeneous Data Collected in Multiple Research Projects to Sustainable Linguistic Resources , 2006 .

[23]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..