Example-Based Treebank Querying

The recent construction of large linguistic treebanks for spoken and written Dutch (e.g. CGN, LASSY, Alpino) has created new and exciting opportunities for the empirical investigation of Dutch syntax and semantics. However, the exploitation of those treebanks requires knowledge of specific data structures and query languages such as XPath. Linguists who are unfamiliar with formal languages are often reluctant towards learning such a language. In order to make treebank querying more attractive for non-technical users we developed GrETEL (Greedy Extraction of Trees for Empirical Linguistics), a query engine in which linguists can use natural language examples as a starting point for searching the Lassy treebank without knowledge about tree representations nor formal query languages. By allowing linguists to search for similar constructions as the example they provide, we hope to bridge the gap between traditional and computational linguistics. Two case studies are conducted to provide a concrete demonstration of the tool. The architecture of the tool is optimised for searching the LASSY treebank, but the approach can be adapted to other treebank lay-outs.

[1]  Petr Pajas,et al.  Querying Diverse Treebanks in a Uniform Way , 2010, LREC.

[2]  Ineke Schuurman,et al.  Harvesting Dutch Trees: Syntactic Properties of Spoken Dutch , 2002, CLIN.

[3]  Gosse Bouma,et al.  Mining Syntactically Annotated Corpora with XQuery , 2007, LAW@ACL.

[4]  Gertjan van Noord,et al.  Syntactic Annotation of Large Corpora in STEVIN , 2006, LREC.

[5]  Catherine Lai,et al.  LPath+: A First-Order Complete Language for Linguistic Tree Query , 2005, PACLIC.

[6]  Frank Van Eynde,et al.  Large Scale Syntactic Annotation of Written Dutch: Lassy , 2013, Essential Speech and Language Technology for Dutch.

[7]  Christian Chiarcos,et al.  The TIGER Corpus Navigator , 2010 .

[8]  Gosse Bouma,et al.  Querying Dependency Treebanks in XML , 2002, LREC.

[9]  Philip Resnik,et al.  The Linguist's Search Engine: An Overview , 2005, ACL.

[10]  Frank Van Eynde A treebank-driven investigation of predicative complements in Dutch : An efficient, practical, actually usable approach , 2009 .

[11]  Gertjan van Noord,et al.  The Alpino Dependency Treebank , 2001, CLIN.

[12]  Gertjan van Noord,et al.  At Last Parsing Is Now Operational , 2006, JEPTALNRECITAL.

[13]  W.J.M. Haeseryn Algemene Nederlandse spraakkunst , 1997 .

[14]  Janne Bondi Johannessen,et al.  SearchTree - a userfriendly treebank search interface , 2004 .

[15]  Catherine Lai,et al.  Querying and Updating Treebanks: A Critical Survey and Requirements Analysis , 2004, ALTA.

[16]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[17]  Eugene Charniak,et al.  Statistical Parsing with a Context-Free Grammar and Word Statistics , 1997, AAAI/IAAI.

[18]  Sabine Brants,et al.  The TIGER Treebank , 2001 .