Towards a Cross-Linguistic Production Data Archive: Structure and Exploration

The present paper presents the structure of a cross-linguistic database of production data. The database contains annotated texts collected from a sample of fifteen different languages by means of identical data gathering methods, which are designed to enable studies on typology and universals of information structure. The special property of this database is that it combines the features of a natural language corpus and the features of a typological database. The challenge for the exploration interface is to provide user-friendly support for exploiting this particular type of resource, thus facilitating empirical generalizations about the collected data in the individual languages and comparison among them.

[1]  Sizhi Ding Fundamentals of Prinmi (Pumi): a Tibeto-Burman language of northwestern Yunnan, China , 1998 .

[2]  Anne Schwarz,et al.  Working Papers of the SFB 632 , 2007 .

[3]  Peter Wittenburg,et al.  Autotypologizing Databases and their Use in Fieldwork , 2002 .

[4]  A contribution to ‘two-dimensional’ language description: the Typological Database of Intensifiers and Reflexives , 2007 .

[5]  Dunstan Brown,et al.  The Surrey Database of Agreement , 2002 .

[6]  Peter Wittenburg,et al.  Methods of language documentation in the DOBES program , 2002 .

[7]  B. Comrie,et al.  Lingua descriptive studies: Questionnaire , 1977 .

[8]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[9]  Thomas C. Schmidt Transcribing and annotating spoken language with EXMARaLDA , 2004 .

[10]  Ö. Dahl,et al.  Tense and aspect in the languages of Europe , 2000 .

[11]  David Gil,et al.  The World Atlas of Language Structures , 2005 .

[12]  B. Steele For More Information , 2000, Journal of the National Cancer Institute.

[13]  Stefanie Dipper,et al.  XML-based Stand-off Representation and Exploitation of Multi-Level Linguistic Annotation , 2005, Berliner XML Tage.

[14]  Michael J. Muller,et al.  Requirements specification , 2002 .

[15]  Manfred Stede,et al.  ANNIS: A Linguistic Database for Exploring Information Structure , 2004 .

[16]  Peter Wittenburg,et al.  Methods of Language Documentation in the DOBES project , 2002, LREC.

[17]  P. Boersma Praat : doing phonetics by computer (version 4.4.24) , 2006 .