Querying Linguistic Treebanks with Monadic Second-Order Logic in Linear Time

In recent years large amounts of electronic texts have become available. While the first of these corpora had only a low level of annotation, the more recent ones are annotated with refined syntactic information. To make these rich annotations accessible for linguists, the development of query systems has become an important goal. One of the main difficulties in this task consists in the choice of the right query language, a language which at the same time should be powerful enough to let users formulate the queries they want and which should be efficiently evaluable to keep query response times short. There is a widespread belief that such a query language does not exist. It is therefore the aim of this paper to show that there is indeed a powerful query language that can be efficiently evaluated. We propose the use of monadic second-order logic as a query language. We show that a query in this language can be evaluated in linear time in the size of a tree in the corpus. We also provide examples of complicated linguistic queries expressed in monadic second-order logic thereby demonstrating the high expressive power of the language.

[1]  Hans L. Bodlaender,et al.  A linear time algorithm for finding tree-decompositions of small treewidth , 1993, STOC.

[2]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[3]  Bruno Courcelle,et al.  The Monadic Second-Order Logic of Graphs. I. Recognizable Sets of Finite Graphs , 1990, Inf. Comput..

[4]  Bruno Courcelle,et al.  The monadic second-order logic of graphs III: tree-decompositions, minor and complexity issues , 1992, RAIRO Theor. Informatics Appl..

[5]  Marcus Kracht,et al.  The mathematics of language , 2003 .

[6]  Joakim Nivre,et al.  Proceedings of the Fifth Workshop on Treebanks and Linguistic Theories , 2006 .

[7]  Wojciech Skut,et al.  SYNTACTIC ANNOTATION OF A GERMAN NEWSPAPER CORPUS , 2003 .

[8]  Bruno Courcelle,et al.  Graph Rewriting: An Algebraic and Logic Approach , 1991, Handbook of Theoretical Computer Science, Volume B: Formal Models and Sematics.

[9]  Jörg Flum,et al.  Finite model theory , 1995, Perspectives in Mathematical Logic.

[10]  Walt Detmar Meurers,et al.  Detecting Errors in Part-of-Speech Annotation , 2003, EACL.

[11]  Stephan Kepser Finite Structure Query: A Tool for Querying Syntactically Annotated Corpora , 2003, EACL.

[12]  Bruno Courcelle,et al.  The Monadic Second-Order Logic of Graphs VIII: Orientations , 1995, Ann. Pure Appl. Log..

[13]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[14]  H. Keisler,et al.  Handbook of mathematical logic , 1977 .

[15]  Wolfgang Lezius,et al.  A Description Language for Syntactically Annotated Corpora , 2000, COLING.

[16]  Paul D. Seymour,et al.  Graph Minors. II. Algorithmic Aspects of Tree-Width , 1986, J. Algorithms.

[17]  Sean Wallis,et al.  Exploiting fuzzy tree fragment queries in the investigation of parsed corpora , 2000 .

[18]  Erhard W. Hinrichs,et al.  The VERBMOBIL Treebanks , 2000, KONVENS.

[19]  Detlef Seese,et al.  Easy Problems for Tree-Decomposable Graphs , 1991, J. Algorithms.

[20]  Dirk Siefkes,et al.  Decidable Theories II , 1970 .

[21]  Moshe Y. Vardi The complexity of relational query languages (Extended Abstract) , 1982, STOC '82.

[22]  Thomas Schwentick,et al.  Expressive and efficient pattern languages for tree-structured data (extended abstract) , 2000, PODS '00.

[23]  Laura Kallmeyer,et al.  Querying treebanks of spontaneous speech with VIQTORYA , 2002 .

[24]  James Clark,et al.  XSL Transformations (XSLT) Version 1.0 , 1999 .

[25]  James W. Thatcher,et al.  Generalized finite automata theory with an application to a decision problem of second-order logic , 1968, Mathematical systems theory.

[26]  Bruno Courcelle,et al.  Monadic Second-Order Evaluations on Tree-Decomposable Graphs , 1993, Theor. Comput. Sci..

[27]  Sabine Brants,et al.  The TIGER Treebank , 2001 .

[28]  John Doner,et al.  Tree Acceptors and Some of Their Applications , 1970, J. Comput. Syst. Sci..

[29]  Hans L. Bodlaender,et al.  A Tourist Guide through Treewidth , 1993, Acta Cybern..