Extracting information from multimedia meeting collections

Multimedia meeting collections, composed of unedited audio and video streams, handwritten notes, slides, and electronic documents that jointly constitute a raw record of complex human interaction processes in the workplace, have attracted interest due to the increasing feasibility of recording them in large quantities, by the opportunities for information access and retrieval applications derived from the automatic extraction of relevant meeting information, and by the challenges that the extraction of semantic information from real human activities entails. In this paper, we present a succint overview of recent approaches in this field, largely influenced by our own experiences. We first review some of the existing and potential needs for users of multimedia meeting information systems. We then summarize recent work on various research areas addressing some of these requirements. In more detail, we describe our work on automatic analysis of human interaction patterns from audio-visual sensors, discussing open issues in this domain.

[1]  Elizabeth Shriberg,et al.  Relationship between dialogue acts and hot spots in meetings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Jean-Marc Odobez,et al.  Application of Information Retrieval Technologies to Presentation Slides , 2006, IEEE Transactions on Multimedia.

[3]  Samy Bengio,et al.  Learning Influence among Interacting Markov Chains , 2005, NIPS.

[4]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[5]  Jean Carletta,et al.  Extractive summarization of meeting recordings , 2005, INTERSPEECH.

[6]  Alexander H. Waibel,et al.  Modeling focus of attention for meeting indexing based on multiple cues , 2002, IEEE Trans. Neural Networks.

[7]  Jeff A. Bilmes,et al.  Dialog act tagging using graphical models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Steve Whittaker,et al.  Accessing Multimodal Meeting Data: Systems, Problems and Possibilities , 2004, MLMI.

[9]  Daniel P. W. Ellis,et al.  Pitch-based emphasis detection for characterization of meeting recordings , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[10]  Denis Lalanne,et al.  Thematic segmentation of meetings through document/speech alignment , 2004, MULTIMEDIA '04.

[11]  Elizabeth Shriberg,et al.  The ICSI Meeting Recorder Dialog Act (MRDA) Corpus , 2004, SIGDIAL Workshop.

[12]  Hagen Soltau,et al.  Advances in automatic meeting record creation and access , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[13]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[14]  Elizabeth Shriberg,et al.  Spotting "hot spots" in meetings: human judgments and prosodic cues , 2003, INTERSPEECH.

[15]  Andrei Popescu-Belis,et al.  Detection and Resolution of References to Meeting Documents , 2005, MLMI.

[16]  John C. Tang,et al.  Liveboard: a large interactive display supporting group meetings, presentations, and remote collaboration , 1992, CHI.

[17]  Rieks op den Akker,et al.  Towards Automatic Addressee Identification in Multi-party Dialogues , 2004, SIGDIAL Workshop.

[18]  E.,et al.  GROUPS : INTERACTION AND PERFORMANCE , 2001 .

[19]  Samy Bengio,et al.  Modeling individual and group actions in meetings with layered HMMs , 2006, IEEE Transactions on Multimedia.

[20]  Samy Bengio,et al.  Multi Channel Sequence Processing , 2004, Deterministic and Statistical Methods in Machine Learning.

[21]  Ying Li,et al.  An overview of technologies for e-meeting and e-lecture , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[22]  Leysia Palen,et al.  “I'll get that off the audio”: a case study of salvaging multimedia meeting records , 1997, CHI.

[23]  Jean-Marc Odobez,et al.  Multimodal multispeaker probabilistic tracking in meetings , 2005, ICMI '05.

[24]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[25]  Martial Michel,et al.  The NIST Smart Space and Meeting Room projects: signals, acquisition annotation, and metrics , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[26]  Kazutaka Hirata,et al.  Memory cues for meeting video retrieval , 2004, CARPE'04.

[27]  Andrei Popescu-Belis,et al.  User Query Analysis for the Specification and Evaluation of a Dialogue Processing and Retrieval System , 2004, LREC.

[28]  Andreas Stolcke,et al.  The Meeting Project at ICSI , 2001, HLT.

[29]  Steve Whittaker,et al.  Analysing Meeting Records: An Ethnographic Study and Technological Implications , 2005, MLMI.

[30]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[31]  Elizabeth Shriberg,et al.  Spontaneous speech: how people really talk and why engineers should care , 2005, INTERSPEECH.

[32]  Elizabeth Shriberg,et al.  Automatic dialog act segmentation and classification in multiparty meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[33]  Samy Bengio,et al.  The expected performance curve: a new assessment measure for person authentication , 2004, Odyssey.

[34]  Steve Renals,et al.  Multimodal Integration for Meeting Group Action Segmentation and Recognition , 2005, MLMI.

[35]  Samy Bengio,et al.  Automatic analysis of multimodal group actions in meetings , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Klaus Zechner,et al.  Automatic Summarization of Open-Domain Multiparty Dialogues in Diverse Genres , 2002, CL.

[37]  Samy Bengio,et al.  Detecting group interest-level in meetings , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[38]  Norbert A. Streitz,et al.  DOLPHIN: integrated meeting support across local and remote desktop environments and LiveBoards , 1994, CSCW '94.

[39]  Wessel Kraaij,et al.  Automatic Summarization of Meeting Data: A Feasibility Study , 2005, CLIN.

[40]  Daniel P. W. Ellis,et al.  Audio information access from meeting rooms , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[41]  Samy Bengio,et al.  Multimodal group action clustering in meetings , 2004, VSSN '04.

[42]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[43]  A. Nakamura,et al.  Nature (London , 1975 .

[44]  Mary P. Harper,et al.  VACE Multimodal Meeting Corpus , 2005, MLMI.