Building and Querying Parallel Treebanks

This paper describes our work on building a trilingual parallel treebank. We have annotated constituent structure trees from three text genres (a philosophy novel, economy reports and a technical user manual). Our parallel treebank includes word and phrase alignments. The alignment information was manually checked using a graphical tool that allows the annotator to view a pair of trees from parallel sentences. This tool comes with a powerful search facility which supersedes the expressivity of previous popular treebank query engines.

[1]  Martin Volk,et al.  Phrase Alignment in Parallel Treebanks , 2006 .

[2]  Martin Cmejrek,et al.  Treebanks in Machine Translation , 2003 .

[3]  Katrin Erk,et al.  The SALSA Corpus: a German Corpus Resource for Lexical Semantics , 2006, LREC.

[4]  Martin Volk,et al.  Requirements for a parallel treebank search tool , 2005 .

[5]  Ventsislav Zhechev,et al.  Automatic Generation of Parallel Treebanks: An Efficient Unsupervised System , 2010 .

[6]  Martin Volk,et al.  Extending the TIGER query language with universal quantification , 2008, KONVENS.

[7]  Joakim Nivre,et al.  What kinds of trees grow in Swedish soil , 2002 .

[8]  Walt Detmar Meurers,et al.  Detecting Errors in Discontinuous Structural Annotation , 2005, ACL.

[9]  Andy Way,et al.  Robust language pair-independent sub-tree alignment , 2007, MTSUMMIT.

[10]  Joakim Nivre,et al.  Treebanking in Northern Europe: A White Paper , 2005 .

[11]  I. Dan Melamed,et al.  Manual Annotation of Translational Equivalence: The Blinker Project , 1998, ArXiv.

[12]  Torsten Marek,et al.  XML-based Phrase Alignment in Parallel Treebanks , 2006, NLPXML@EACL.

[13]  Ivana Kruijff-Korbayová,et al.  Annotation Guidelines for Czech-English Word Alignment , 2006, LREC.

[14]  Ulrik Sandborg-Petersen Querying Both Parallel And Treebank Corpora: Evaluation Of A Corpus Query System , 2006, LREC.

[15]  Tylman Ule,et al.  Unexpected Productions May Well be Errors , 2004, LREC.

[16]  Janne Bondi Johannessen,et al.  SearchTree - a userfriendly treebank search interface , 2004 .

[17]  Andy Way,et al.  Robust Sub-Sentential Alignment of Phrase-Structure Trees , 2004, COLING.

[18]  Noah A. Smith,et al.  Cairo: An Alignment Visualization Tool , 2000, LREC.

[19]  Walt Detmar Meurers,et al.  Detecting Errors in Part-of-Speech Annotation , 2003, EACL.

[20]  Martin Volk,et al.  Alignment Tools for Parallel Treebanks , 2007 .

[21]  Andy Way,et al.  Automatic Generation of Parallel Treebanks , 2008, COLING.

[22]  Catherine Lai,et al.  Querying and Updating Treebanks: A Critical Survey and Requirements Analysis , 2004, ALTA.

[23]  Lars Ahrenberg,et al.  LinES: An English-Swedish Parallel Treebank , 2007, NODALIDA.

[24]  Walt Detmar Meurers,et al.  Detecting Inconsistencies in Treebanks , 2003 .

[25]  A. Lavie,et al.  Improving Syntax-Driven Translation Models by Re-structuring Divergent and Nonisomorphic Parse Tree Structures , 2008, AMTA.

[26]  Martin Volk,et al.  Human Judgements in Parallel Treebank Alignment , 2008, COLING 2008.

[27]  Mihaela Vela,et al.  Multi-dimensional Annotation and Alignment in an English-German Translation Corpus , 2006, NLPXML@EACL.

[28]  Hrafn Loftsson,et al.  Correcting a POS-Tagged Corpus Using Three Complementary Methods , 2009, EACL.

[29]  Martin Volk,et al.  Using the Stockholm TreeAligner , 2007 .

[30]  Magnus Merkel,et al.  Interactive Word Alignment for Language Engineering , 2003, EACL.

[31]  Magnus Merkel,et al.  A System for Incremental and Interactive Word Linking , 2002, LREC.

[32]  Martin Volk,et al.  A Quechua-Spanish parallel treebank , 2008 .

[33]  Andy Way,et al.  Disambiguation Strategies for Data-Oriented Translation , 2006, EAMT.

[34]  Wolfgang Lezius,et al.  The TIGER language. - A Description Language for Syntax Graphs , 2000 .

[35]  Jörg Tiedemann,et al.  Building a Large Machine-Aligned Parallel Treebank , 2009 .