Mining Syntactically Annotated Corpora with XQuery

This paper presents a uniform approach to data extraction from syntactically annotated corpora encoded in XML. XQuery, which incorporates XPath, has been designed as a query language for XML. The combination of XPath and XQuery offers flexibility and expressive power, while corpus specific functions can be added to reduce the complexity of individual extraction tasks. We illustrate our approach using examples from dependency treebanks for Dutch.

[1]  Gertjan van Noord,et al.  The Alpino Dependency Treebank , 2001, CLIN.

[2]  Daniel Jurafsky,et al.  Learning Syntactic Patterns for Automatic Hypernym Discovery , 2004, NIPS.

[3]  Nelleke Oostdijk,et al.  The Spoken Dutch Corpus. Overview and First Evaluation , 2000, LREC.

[4]  Gosse Bouma,et al.  Linguistic Knowledge and Question Answering , 2006 .

[5]  Steve Cassidy,et al.  XQuery as an Annotation Query Language: a Use Case Analysis , 2002, LREC.

[6]  Razvan C. Bunescu,et al.  A Shortest Path Dependency Kernel for Relation Extraction , 2005, HLT.

[7]  Jean Carletta,et al.  Towards an Alternative Implementation of NXTs Query Language via XQuery , 2006, NLPXML@EACL.

[8]  Begoña Villada Moirón Linguistically enriched corpora for establishing variation in support verb constructions , 2005, LINC@IJCNLP.

[9]  Patrick Pantel,et al.  Discovery of inference rules for question-answering , 2001, Natural Language Engineering.

[10]  Susan B. Davidson,et al.  Designing and Evaluating an XPath Dialect for Linguistic Queries , 2006, 22nd International Conference on Data Engineering (ICDE'06).

[11]  Jimmy J. Lin,et al.  Selectively Using Relations to Improve Precision in Question Answering , 2003 .

[12]  Diego Mollá Aliod,et al.  Answerfinder: Question Answering by Combining Lexical, Syntactic and Semantic Information , 2004, ALTA.

[13]  Gosse Bouma,et al.  Querying Dependency Treebanks in XML , 2002, LREC.

[14]  W.J.M. Haeseryn Algemene Nederlandse spraakkunst , 1997 .

[15]  Tat-Seng Chua,et al.  Question answering passage retrieval using dependency relations , 2005, SIGIR '05.

[16]  Gertjan van Noord,et al.  Alpino: Wide-coverage Computational Analysis of Dutch , 2000, CLIN.

[17]  Leonoor Johanneke van der Beek,et al.  Topics in corpus-based Dutch syntax , 2005 .

[18]  G. Bouma,et al.  Focus Particles Inside Prepositional Phrases: A Comparison of Dutch, English, and German , 2007 .