Assessing Annotated Corpora as Research Output*

The increasing importance of language documentation as a paradigm in linguistic research means that many linguists now spend substantial amounts of time preparing digital corpora of language data for long-term access. Benefits of this development include: (i) making analyses accountable to the primary material on which they are based; (ii) providing future researchers with a body of linguistic material to analyse in ways not foreseen by the original collector of the data; and, equally importantly, (iii) acknowledging the responsibility of the linguist to create records that can be accessed by the speakers of the language and by their descendants. Preparing such data collections requires substantial scholarly effort, and in order to make this approach sustainable, those who undertake it need to receive appropriate academic recognition of their effort in relevant institutional contexts. Such recognition is especially important for early-career scholars so that they can devote efforts to the compilation of annotated corpora and to making them accessible without damaging their careers in the long-term by impacting negatively on their publication record. Preliminary discussions between the Australian Linguistic Society (ALS) and the Australian Research Council (ARC) made it clear that the ARC accepts that curated corpora can legitimately be seen as research output, but that it is the responsibility of the ALS (and the scholarly community more generally) to establish conventions to accord scholarly credibility to such research products. This paper reports on the activities of the authors in exploring this issue on behalf of the ALS and it discusses issues in two areas: (a) what sort of process is appropriate in according acknowledgment and validation to curated corpora as research output; and (b) what are the appropriate criteria against which such validation should be judged? While the discussion focuses on the Australian linguistic context, it is also more broadly applicable as we will present in this article.

[1]  Gary Simons,et al.  Seven Dimensions of Portability for Language Documentation and Description , 2002, ArXiv.

[2]  C. Strasser,et al.  Researcher Perspectives on Publication and Peer Review of Data , 2014, PloS one.

[3]  Amir Zeldes Tony McEnery, Richard Xiao & Yukio Tono. 2006. Corpus-Based Language Studies. An Advanced Resource Book (Routledge Applied Linguistics). London, New York: Routledge. xx, 386 S , 2010 .

[4]  John Henderson,et al.  Conference of the Australian Linguistic Society , 2000 .

[5]  Anna Margetts Data Processing and its Impact on Linguistic Analysis , 2009 .

[6]  S. Urbinati What Role for Authenticity in the 2003 UNESCO Convention for the Safeguarding of Intangible Cultural Heritage , 2014 .

[7]  Anna Margetts,et al.  Potentials of language documentation: methods, analyses, and utilization , 2012 .

[8]  Martin Haspelmath and Susanne Maria Michaelis Annotated corpora of small languages as refereed publications: a vision , 2014 .

[9]  Sarah Callaghan,et al.  Citation and Peer Review of Data: Moving Towards Formal Data Publication , 2011, Int. J. Digit. Curation.

[10]  Anne O'Keeffe,et al.  Historical Perspective: What are Corpora and How have they Evolved?(Pre-Published Version) , 2010 .

[11]  N. Himmelmann,et al.  Documentary and descriptive linguistics , 1998 .

[12]  Lindsay J. Whaley,et al.  Dying words: endangered languages and what they have to tell us , 2011 .

[13]  Mary Bucholtz,et al.  Variation in transcription , 2007 .

[14]  Veerle Van den Eynden,et al.  Managing and Sharing Research Data: A Guide to Good Practice , 2014 .

[15]  Gideon Thomas Dying Words: Endangered Languages and What They Have to Tell Us , 2010 .

[16]  Tony McEnery,et al.  Corpus-Based Language Studies: An Advanced Resource Book , 2006 .

[17]  C. Tenopir,et al.  Data Sharing by Scientists: Practices and Perceptions , 2011, PloS one.

[18]  Sarah Callaghan Data without Peer: Examples of Data Peer Review in the Earth Sciences , 2015, D Lib Mag..

[19]  Nikolaus P. Himmelmann,et al.  Linguistic Data Types and the Interface between Language Documentation and Description , 2012 .

[20]  Mark John Costello Motivating Online Publication of Data , 2009 .

[22]  A. H. Ball,et al.  How to Cite Datasets and Link to Publications:A Report of the Digital Curation Centre , 2012 .

[23]  Christine L. Borgman,et al.  Data, disciplines, and scholarly publishing , 2008, Learn. Publ..