Combining Parallel Treebanks and Geo-Tagging

This paper describes a new kind of semantic annotation in parallel treebanks. We build French-German parallel treebanks of mountaineering reports, a text genre that abounds with geographical names which we classify and ground with reference to a large gazetteer of Swiss toponyms. We discuss the challenges in obtaining a high recall and precision in automatic grounding, and sketch how we represent the grounding information in our treebank.

[1]  Martin Volk,et al.  Frame-Semantic Annotation on a Parallel Treebank , 2007 .

[2]  Simon Clematide,et al.  Learn - Filter - Apply - Forget. Mixed Approaches to Named Entity Recognition , 2001, NLDB.

[3]  Mitchell P. Marcus,et al.  Adding Semantic Annotation to the Penn TreeBank , 1998 .

[4]  Martin Volk,et al.  Challenges in Building a Multilingual Alpine Heritage Corpus , 2010, LREC.

[5]  Andy Way,et al.  Disambiguation Strategies for Data-Oriented Translation , 2006, EAMT.

[6]  M. Volk,et al.  Bootstrapping Parallel Treebanks , 2004, COLING 2004.

[7]  Martin Cmejrek,et al.  Treebanks in Machine Translation , 2003 .

[8]  Ventsislav Zhechev,et al.  Automatic Generation of Parallel Treebanks: An Efficient Unsupervised System , 2010 .

[9]  Lars Ahrenberg,et al.  LinES: An English-Swedish Parallel Treebank , 2007, NODALIDA.

[10]  Jörg Tiedemann,et al.  Building a Large Machine-Aligned Parallel Treebank , 2009 .

[11]  Martha Palmer,et al.  Adding predicate argument structure to the Penn TreeBank , 2002 .

[12]  Martin Volk,et al.  A framework for constituent-dependency conversion , 2009 .

[13]  Martin Volk,et al.  A Search Tool for Parallel Treebanks , 2007, LAW@ACL.

[14]  Andy Way,et al.  Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation , 2009, CICLing.

[15]  Martin Volk,et al.  Extending the TIGER query language with universal quantification , 2008, KONVENS.

[16]  Martin Volk,et al.  Using the Stockholm TreeAligner , 2007 .

[17]  Alexandra Kinyon,et al.  Building a Treebank for French , 2000, LREC.

[18]  Katrin Erk,et al.  The SALSA Corpus: a German Corpus Resource for Lexical Semantics , 2006, LREC.

[19]  Eckhard Bick FrAG, a Hybrid Constraint Grammar Parser for French , 2010, LREC.

[20]  Mihaela Vela,et al.  Multi-dimensional Annotation and Alignment in an English-German Translation Corpus , 2006, NLPXML@EACL.