Information structure in African languages: corpora and tools

In this paper, we describe tools and resources for the study of African languages developed at the Collaborative Research Centre 632 “Information Structure”. These include deeply annotated data collections of 25 sub-Saharan languages that are described together with their annotation scheme, as well as the corpus tool ANNIS, which provides unified access to a broad variety of annotations created with a range of different tools. With the application of ANNIS to several African data collections, we illustrate its suitability for the purpose of language documentation, distributed access, and the creation of data archives.

[1]  Sabine Braun,et al.  Corpus technology and language pedagogy : new resources, new tools, new methods , 2006 .

[2]  Christian Chiarcos,et al.  A Flexible Framework for Integrating Annotations from Different Tools and Tagsets , 2008 .

[3]  Christoph Müller,et al.  Multi-level annotation of linguistic data with MMAX 2 , 2006 .

[4]  Katharina Hartmann,et al.  Morphological focus marking in Gùrùntùm (West Chadic) , 2009 .

[5]  Michael ODonnell,et al.  RSTTool 2.4 - A markup Tool for Rhetorical Structure Theory , 2000, INLG.

[6]  M. Zimmermann Contrastive focus and emphasis , 2008 .

[7]  Berthold Crysmann Autosegmental representations in an HPSG of Hausa , 2009 .

[8]  Jean Carletta,et al.  The NITE XML Toolkit: Data Model and Query Language , 2005, Lang. Resour. Evaluation.

[9]  Stefan Evert,et al.  The NITE XML Toolkit: Flexible annotation for multimodal language data , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[10]  Anne Schwarz Verb-and-predication focus markers in Gur , 2010 .

[11]  Manfred Krifka,et al.  Basic notions of information structure , 2008 .

[12]  Ian Witten,et al.  Data Mining , 2000 .

[13]  Charles N. Li,et al.  Subject and topic , 1979 .

[14]  Ines Fiedler,et al.  Narrative focus strategies in Gur and Kwa , 2007 .

[15]  Stavros Skopeteas,et al.  Information Structure in Cross-Linguistic Corpora: , 2007 .

[16]  Manfred Krifka,et al.  A Compositional Semantics for Multiple Focus Constructions , 1991 .

[17]  Stefanie Dipper,et al.  XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation , 2005, Berliner XML Tage.

[18]  Malte Zimmermann,et al.  Focus Strategies in African Languages , 2007 .

[19]  Stefanie Dipper,et al.  Accessing Heterogeneous Linguistic Data — Generic XML-based Representation and Flexible Visualization , 2004 .

[20]  Katharina Hartmann,et al.  Nominal and Verbal Focus in the Chadic Languages , 2004 .

[21]  Katharina Hartmann,et al.  In Place – Out of Place : Focus in Hausa , 2004 .

[22]  Thorsten Brants,et al.  Interactive Corpus Annotation , 2000, LREC.

[23]  Paul Newman,et al.  Hausa Language , 2000 .

[24]  Constantin Orasan,et al.  PALinkA: A highly customisable tool for discourse annotation , 2003, SIGDIAL Workshop.

[25]  Thomas C. Schmidt Transcribing and annotating spoken language with EXMARaLDA , 2004 .

[26]  Katharina Hartmann,et al.  Subject focus in West African languages , 2010 .

[27]  Katharina Hartmann,et al.  Focus strategies in African languages : the interaction of focus and grammar in Niger-Congo and Afro-Asiatic , 2007 .

[28]  Katharina Hartmann,et al.  Focus Strategies in Chadic: The Case of Tangale Revisited * , 2007 .

[29]  W. Chafe Givenness, contrastiveness, definiteness, subjects, topics, and point of view , 1976 .

[30]  K. Kiss Identificational focus versus information focus , 1998 .

[31]  Philip J. Jaggar,et al.  Ex-situ and in-situ focus in Hausa , 2003 .

[32]  Caroline Féry,et al.  Information structure : theoretical, typological, and experimental perspectives , 2010 .

[33]  Laurent Romary,et al.  A model oriented approach to the mapping of annotation formats using standards , 2010 .