Querying Annotated Speech Corpora

This paper is concerned with querying annotated speech corpora. A growing number of such corpora is currently being created worldwide; however, their usefulness for a wider research community is restricted by the lack of standard tools for creating, editing, annotating, storing and querying them. Two solutions for these problems are presented here: the XML-based data format TASX for corpus creation and data format exchange and the NXT search tool for querying corpora. Both tools have been applied to the multi-level annotated LeaP corpus of non-native speech.

[1]  Ulrike Gut,et al.  A Prosodic Corpus of Non-Native Speech , 2002 .

[2]  Li Aijun,et al.  CHINESE PROSODY AND PROSODIC LABELING OF SPONTANEOUS SPEECH , 2002 .

[3]  Manfred Pinkal,et al.  Towards a Resource for Lexical Semantics: A Large German Corpus with Extensive Semantic Annotation , 2003, ACL.

[4]  CassidySteve,et al.  Multi-level annotation in the Emu speech database management system , 2001 .

[5]  Deborah A. Lapeyre,et al.  XSLT: programmer's reference, 2nd edition , 2001 .

[6]  Philip Wadler,et al.  XQuery from the Experts: A Guide to the W3C XML Query Language , 2003 .

[7]  Esther Grabe,et al.  Variation Adds to Prosodic Typology , 2002 .

[8]  Ilana Mushin,et al.  Representational issues in annotation: Using the Australian map task corpus to relate prosody and discourse structure , 2001, Speech Commun..

[9]  C. W. Wightman ToBI Or Not ToBI ? , 2002 .

[10]  Ulrike Gut,et al.  The Prosody of Nigerian English , 2002 .

[11]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[12]  S. Baumann,et al.  German Intonation in Autosegmental-Metrical Phonology* , 2005 .

[13]  Steve Cassidy,et al.  XQuery as an Annotation Query Language: a Use Case Analysis , 2002, LREC.

[14]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[15]  Michael H. Kay,et al.  XSLT Programmer's Reference 2nd Edition , 2001 .

[16]  J. Vachek С. ENGLISH PHONOLOGY , 1976 .

[17]  Hansjörg Mixdorff,et al.  Speech Technology, ToBI, and Making Sense of Prosody , 2002 .

[18]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[19]  Jean Carletta,et al.  Supporting linguistic annotation using XML and stylesheets , 2002 .

[20]  Sylviane Granger,et al.  Computer Learner Corpora, Second Language Acquisition and Foreign Language Teaching , 2002 .

[21]  Michael Kay,et al.  XSLT Programmer's Reference , 2000 .

[22]  G. E. Peterson,et al.  Duration of Syllable Nuclei in English , 1960 .

[23]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[24]  Jonathan Harrington,et al.  Speech annotation and corpus tools , 2001, Speech Commun..