The ‘Language Archiving Technology’ solutions for sustainable data from digital fieldwork research

Since the late 1990s, the Technical Group at the Max-Planck-Institute for Psycholinguistics (TG, now The Language Archive, TLA) has worked on solutions for ‘sustainable data from digital research’, in particular, how to guarantee long-term availability of digital data for future research. These activities have gained a new dimension and new dynamics since 2000, when the TG started to participate in the program Dokumentation Bedrohter Sprachen ‘Documentation of Endangered Languages’ (DOBES). DOBES was crucial in establishing the new field of language documentation which aims at creating lasting multi-purpose corpora of annotated multi-media samples of small and understudied languages. The TG’s contribution included hosting the emerging DOBES archive. This was a major factor for the ongoing development of ‘Language Archiving Technology’ (LAT), which will be explained and detailed below. This chapter presents the LAT solutions to many of the challenges for sustainable data from digital fieldwork research (and much of it carries over to other types of digital research data).First we outline our conception of digital fieldwork data and sustainability (§1), then we present the LAT suite of programs and web services (§2), discuss open access and legal and ethical issues (§3) and finally (§4) we summarize LAT answers to several relevant questions. A short conclusion (§5) summarises the most important points.

[1]  Marc Kemps-Snijders,et al.  Lexicon standards: From de facto standard Toolbox MDF to ISO standard LMF , 2010, LREC 2010.

[2]  Bladimir Díaz Borges Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities , 2008 .

[3]  Scott Farrar,et al.  A linguistic ontology for the semantic web , 2003 .

[4]  Menzo Windhouwer,et al.  Evolving challenges in archiving and data infrastructures , 2011 .

[5]  Huajun Chen,et al.  The Semantic Web , 2011, Lecture Notes in Computer Science.

[6]  Marc Kemps-Snijders,et al.  A Data Category Registry- and Component-based Metadata Framework , 2010, LREC.

[7]  Sebastian Drude,et al.  Advanced glossing: A language documentation format , 2001 .

[8]  Ken Hale,et al.  Endangered languages: On endangered languages and the safeguarding of diversity , 2015 .

[9]  Alexander Nakhimovsky,et al.  Interoperability for Language Documentation The Role of Semantic Web Tools , 2010 .

[10]  Gary F. Simons,et al.  The world’s languages in crisis , 2013 .

[11]  Francine Berman,et al.  Got data?: a guide to data preservation in the information age , 2008, CACM.

[12]  Steven Bird,et al.  A Four-Level Model for Interlinear Text , 2003 .

[13]  Jens Klump,et al.  Langzeitarchivierung von Forschungsdaten. Eine Bestandsaufnahme , 2012 .

[14]  C. M. Sperberg-McQueen,et al.  Extensible Markup Language (XML) , 1997, World Wide Web J..

[15]  角田 太作,et al.  Language endangerment and language revitalization : an introduction , 2006 .

[16]  Steven Bird,et al.  Functional Requirements for an Interlinear Text Editor , 2004, LREC.

[17]  Ernst-Joachim Meusel,et al.  Max-Planck-Gesellschaft , 1996 .

[18]  Steven Bird,et al.  Towards a general model of interlinear text , 2003 .

[19]  Ronald Schroeter,et al.  EOPAS, the EthnoER online representation of interlinear text , 2006 .

[20]  Κωνσταντίνος Γληνός Riding the Wave of Scientific Data , 2010 .

[21]  Geoffrey Haig,et al.  Documenting Endangered Languages: Achievements and Perspectives , 2011 .

[22]  John Unsworth,et al.  A Companion to Digital Humanities , 2008 .

[23]  N. Himmelmann,et al.  Documentary and descriptive linguistics , 1998 .

[24]  Arienne M. Dwyer,et al.  Ethics and practicalities of cooperative fieldwork and analysis , 2006 .

[25]  Sebastian Drude Advanced glossing: A language documentation format and its implementation with Shoebox , 2003 .

[26]  Ulrike Mosel,et al.  Essentials of language documentation , 2006 .

[27]  Sebastian Drude,et al.  Digital Grammars: Integrating the Wiki/CMS approach with Language Archiving Technology and , 2012 .

[28]  M. T. Lino,et al.  Proceedings of the 4th International Conference on Language Resources and Evaluation , 2004 .

[29]  Nicholas Evans,et al.  Dying Words: Endangered Languages and What They Have to Tell Us , 2009 .