A high speed transcription interface for annotating primary linguistic data

We present a new transcription mode for the annotation tool ELAN. This mode is designed to speed up the process of creating transcriptions of primary linguistic data (video and/or audio recordings of linguistic behaviour). We survey the basic transcription workflow of some commonly used tools (Transcriber, BlitzScribe, and ELAN) and describe how the new transcription interface improves on these existing implementations. We describe the design of the transcription interface and explore some further possibilities for improvement in the areas of segmentation and computational enrichment of annotations.

[1]  Oliver Schreer,et al.  AVATecH: Audio/Video Technology for Humanities Research , 2011 .

[2]  Michael Tomasello,et al.  Sampling children's spontaneous speech: how much is enough? , 2004, Journal of child language.

[3]  Andrea L. Berez Review of EUDICO Linguistic Annotator (ELAN) , 2007 .

[4]  S. Levinson,et al.  The myth of language universals: language diversity and its importance for cognitive science. , 2009, The Behavioral and brain sciences.

[5]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[6]  Steven Bird,et al.  A Four-Level Model for Interlinear Text , 2003 .

[7]  Lars Borin,et al.  Unsupervised Learning of Morphology , 2011, CL.

[8]  Chris Rogers Review of Fieldworks Language Explorer (FLEx) 3.0 , 2010 .

[9]  C. F. Hockett Two Models of Grammatical Description , 1954 .

[10]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[11]  Steven Bird,et al.  Towards a general model of interlinear text , 2003 .

[12]  Ulrike Mosel,et al.  Essentials of language documentation , 2006 .

[13]  Nikolaus P. Himmelmann,et al.  Reproduction and Preservation of Linguistic Knowledge: Linguistics' Response to Language Endangerment , 2008 .

[14]  Deb Roy,et al.  Fast transcription of unstructured audio recordings , 2009, INTERSPEECH.

[15]  Oliver Schreer,et al.  Automatic annotation of media field recordings , 2010 .

[16]  Stefanie Tellex,et al.  The Human Speechome Project , 2006, EELC.

[17]  Emily M. Bender,et al.  Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus , 2011, EMNLP.