Building and Using a Corpus of Shallow Dialogue Annotated Meetings

In this paper we provide a framework for shallow dialog annotations (SDA), and for their use in the context of the processing and retrieval of multimodal meeting recordings. The SDA model groups the following elements: dialog segmentation into utterances and episodes, detection of dialog acts and adjacency pairs, and detection of referring expressions and coreference links, including references to documents. An instantiated XML annotation model based on boundaries, labels and links, is provided. The use of SDA data in a meeting retrieval interface is also described.

[1]  Andrei Popescu-Belis,et al.  Natural Language Queries on Natural Language Data: a Database of Meeting Dialogues , 2003, NLDB.

[2]  Eric Fosler-Lussier,et al.  Discourse Segmentation of Multi-Party Conversation , 2003, ACL.

[3]  Kees van Deemter,et al.  On Coreferring: Coreference in MUC and Related Annotation Schemes , 2000, CL.

[4]  Andreas Stolcke,et al.  Meetings about meetings: research at ICSI on speech in multiparty conversations , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5]  Rebecca J. Passonneau,et al.  Discourse Segmentation by Human and Automated Means , 1997, CL.

[6]  Elizabeth Shriberg,et al.  Meeting Recorder Project: Dialog Act Labeling Guide , 2004 .

[7]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[8]  Andrei Popescu-Belis,et al.  Evaluation-driven design of a robust coreference resolution system , 2003, Natural Language Engineering.

[9]  Andrei Popescu-Belis,et al.  Multi-level Dialogue Act Tags , 2004, SIGDIAL Workshop.

[10]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[11]  Andreas Stolcke,et al.  Dialogue act modeling for automatic tagging and recognition of conversational speech , 2000, CL.

[12]  Andreas Stolcke,et al.  Automatic linguistic segmentation of conversational speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  David R. Traum,et al.  20 Questions on Dialogue Act Taxonomies , 2000, J. Semant..

[14]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[15]  Samy Bengio,et al.  Modeling human interaction in meetings , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..