论文信息 - A Tool/Database Interface for Multi-Level Analyses

A Tool/Database Interface for Multi-Level Analyses

Depending on the nature of a linguistic theory, empirical investigations of its soundness may focus on corpus studies related to lexical, syntactic, semantic or other phenomena. Especially work in research networks usually comprises analyses of different levels of description, where each one must be as reliable as possible when the same sentences and texts are investigated under very different perspectives. This paper describes an infrastructure that interfaces an analysis tool for multi-level annotation with a generic relational database. It supports three dimensions of analysis-handling and thereby builds an integrated environment for quality assurance in corpus based linguistic analysis: a vertical dimension relating analysis components in a pipeline, a horizontal dimension taking alternative results of the same analysis level into account and a temporal dimension to follow up cases where analyses for the same input have been produced with different versions of a tool. As an example we give a detailed description of a typical workflow for the vertical dimension.

Ulrich Heid | Kurt Eberle | Kerstin Eckart | Boris Haselbach

[1] Sebastian Hoffmann. BNCweb (CQP edition) - the marriage of two corpus tools. , 2006 .

[2] Wolfgang Seeker,et al. German nach-Particle Verbs in Semantic Theory and Corpus Data , 2012, LREC.

[3] Nancy Ide,et al. GrAF: A Graph-based Format for Linguistic Annotations , 2007, LAW@ACL.

[4] Helmut Schmidt,et al. Probabilistic part-of-speech tagging using decision trees , 1994 .

[5] Nina Seemann,et al. A Recursive Annotation Scheme for Referential Information Status , 2010, LREC.

[6] E. Prince. The ZPG Letter: Subjects, Definiteness, and Information-status , 1992 .

[7] Ulrich Heid,et al. A tool for corpus analysis using partial disambiguation and bootstrapping of the lexicon , 2008, KONVENS.

[8] Kerstin Eckart,et al. A Discourse Information Radio News Database for Linguistic Analysis , 2012, Linked Data in Linguistics.