Multi-level annotation in the Emu speech database management system

Researchers in various fields, from acoustic phonetics to child language development, rely on digitised collections of spoken language data as raw material for research. Access to this data had, in the past, been provided in an ad-hoc manner with labelling standards and software tools developed to serve only one or two projects. A few attempts have been made at providing generalised access to speech corpora but none of these have gained widespread popularity. The Emu system, described here, is a general purpose speech database management system which supports complex multi-level annotations. Emu can read a number of popular label and data file formats and supports overlaying additional annotation with inter-token relations on existing time-aligned label files. Emu provides a graphical labelling tool which can be extended to provide special purpose displays. The software is easily extended via the Tcl/Tk scripting language which can be used, for example, to manipulate annotations and build graphical tools for database creation. This paper discusses the design of the Emu system, giving a detailed description of the annotation structures that it supports. It is argued that these structures are sufficiently general to allow Emu to read potentially any time-aligned linguistic annotation.

[1]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[2]  Jonathan Harrington,et al.  EMU: an Enhanced Hierarchical Speech Data Management System , 1996 .

[3]  Shigeru Katagiri,et al.  ATR Japanese speech database as a tool of speech recognition and synthesis , 1990, Speech Commun..

[4]  C. M. Sperberg-McQueen,et al.  Extensible markup language , 1997 .

[5]  James L. Hieronymus ASCII Phonetic Symbols for the World''s Languages: Worldbet , 1993 .

[6]  Steve Cassidy,et al.  Querying databases of annotated speech , 2000, Proceedings 11th Australasian Database Conference. ADC 2000 (Cat. No.PR00528).

[7]  Amy Isard,et al.  Towards a minimal standard for dialogue transcripts: a new SGML architecture for the HCRC map task corpus , 1998, ICSLP.

[8]  Alin Deutsch,et al.  Beyond XML Query Languages , 1998, QL.

[9]  Hartmut Liefke,et al.  Horizontal Query Optimization on Ordered Semistructured Data , 1999, WebDB.

[10]  Mark Liberman,et al.  A formal framework for linguistic annotation , 1999, Speech Commun..

[11]  Florian Schiel,et al.  The partitur format at BAS , 1997 .

[12]  Paul Taylor,et al.  Heterogeneous relation graphs as a formalism for representing linguistic information , 2001, Speech Commun..

[13]  Peter Buneman,et al.  Towards a Query Language for Annotation Graphs , 2000, LREC.

[14]  Dieter Huber,et al.  The CTH speech database: An integrated multilevel approach , 1990, Speech Commun..

[15]  Mark Liberman,et al.  Towards a formal framework for linguistic annotations , 1998, ICSLP.

[16]  B. MacWhinney The CHILDES project: tools for analyzing talk , 1992 .

[17]  A. B.,et al.  SPEECH COMMUNICATION , 2001 .

[18]  Toomas Altosaar,et al.  Relational vs. object-oriented models for representing speech: a comparison using ANDOSL data , 1999, EUROSPEECH.

[19]  Jonathan Harrington,et al.  The mu + system for corpus based speech research , 1993, Comput. Speech Lang..

[20]  J Harrington,et al.  Acoustic evidence for dynamic formant trajectories in Australian English vowels. , 1999, The Journal of the Acoustical Society of America.

[21]  Giovanni Flammia,et al.  N.b.: A graphical user interface for annotating spoken dialogue , 1995 .

[22]  John K. Ousterhout,et al.  Tcl and the Tk Toolkit , 1994 .

[23]  Alin Deutsch,et al.  XML-QL: A Query Language for XML , 1998 .

[24]  David McKelvie,et al.  Hyperlink semantics for standoff markup of read-only documents , 1997 .

[25]  Charles F. Goldfarb,et al.  SGML handbook , 1990 .

[26]  Mark Liberman,et al.  Transcriber: a free tool for segmenting, labeling and transcribing speech , 1998, LREC.

[27]  Rolf Carlson,et al.  The KTH speech database , 1990, Speech Commun..

[28]  John Coleman,et al.  The “no crossing constraint” in autosegmental phonology , 1991 .

[29]  Mark G. Core,et al.  Coding Dialogs with the DAMSL Annotation Scheme , 1997 .

[30]  Steve Cassidy,et al.  Compiling multi-tiered speech databases into the relational model: experiments with the emu system , 1999, EUROSPEECH.

[31]  David McKelvie,et al.  The MATE workbench - An annotation tool for XML coded speech corpora , 2001, Speech Commun..

[32]  Jan P. M. Hendriks A formalism for speech database access , 1990, Speech Commun..