Towards an Objective Test for Meeting Browsers: The BET4TQB Pilot Experiment

This paper outlines first the BET method for task-based evaluation of meeting browsers. 'Observations of interest' in meetings are empirically determined by neutral observers and then processed and ordered by evaluators. The evaluation of the TQB annotation-driven meeting browser using the BET is then described. A series of subjects attempted to answer as many meeting-related questions as possible in a fixed amount of time, and their performance was measured in terms of precision and speed. The results indicate that the TQB interface is easy to understand with little prior learning and that its annotation-based search functionality is highly relevant, in particular keyword search over the meeting transcript. Two knowledge-poorer browsers appear to offer lower precision but higher speed. The BET task-based evaluation method thus appears to be a coherent measure of browser quality.

[1]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[2]  Marilyn A. Walker,et al.  PARADISE: A Framework for Evaluating Spoken Dialogue Agents , 1997, ACL.

[3]  Roger K. Moore,et al.  Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation , 2000 .

[4]  Niels Ole Bernsen,et al.  Evaluation and usability of multimodal spoken language dialogue systems , 2004, Speech Commun..

[5]  Andrei Popescu-Belis,et al.  Shallow Dialogue Processing Using Machine Learning Algorithms (or Not) , 2004, MLMI.

[6]  Andrei Popescu-Belis,et al.  TQB: Accessing Multimodal Data Using a Transcript-based Query and Browsing Interface , 2006, LREC.

[7]  David R. Traum,et al.  Evaluation of Multi-party Virtual Reality Dialogue Interaction , 2004, LREC.

[8]  Marie-Francine Moens,et al.  Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation , 2000, Computational Linguistics.

[9]  Andrei Popescu-Belis,et al.  Machine Learning for Multimodal Interaction , 4th International Workshop, MLMI 2007, Brno, Czech Republic, June 28-30, 2007, Revised Selected Papers , 2008, MLMI.

[10]  Steve Whittaker,et al.  A meeting browser evaluation test , 2005, CHI Extended Abstracts.

[11]  Nigel Bevan,et al.  International standards for HCI and usability , 2001, Int. J. Hum. Comput. Stud..

[12]  Hélène Maynard,et al.  Méthodologies d'évaluation des systèmes de dialogue parlé : réflexions et expériences autour de la compréhension , 2002 .

[13]  Sebastian Möller A New ITU-T Recommendation on the Evaluation of Telephone-Based Spoken Dialogue Systems , 2004, LREC.