A multimodal alignment framework for spoken documents

We present a multimodal document alignment framework, which highlights existing alignment relationships between documents that are discussed and recorded during multimedia events such as meetings. These relationships that should help indexing the archives of these events are detected using various techniques from natural language processing and information retrieval. The main alignment strategies studied are based on thematic, quotation and reference relationships. At the analysis level, the alignment framework was applied at several levels of granularity of documents, requiring specific document segmentation techniques. Our framework that is language independent was evaluated on corpora in French and English, including meetings and scientific presentations. The satisfactory evaluation results obtained at several stages show the importance of our approach in bridging the gap between meeting documents, independently from the language and domain. They highlight also the utility of the multimodal alignment in advanced applications, e.g. multimedia document browsing, content-based / temporal-based searching, etc.

[1]  Dalila Mekhaldi,et al.  A study on multimodal document alignment , 2006 .

[2]  Andreas Girgensohn,et al.  Automatically linking multimedia meeting documents by image matching , 2000, HYPERTEXT '00.

[3]  Stephanie Seneff,et al.  Releasing a Multimodal Dialogue System into the Wild: User Support Mechanisms , 2007, SIGDIAL.

[4]  Maria da Graça Campos Pimentel,et al.  Latent semantic linking over homogeneous repositories , 2001, DocEng '01.

[5]  Jane Hunter,et al.  Dynamic Generation of Intelligent Multimedia Presentations through Semantic Inferencing , 2002, ECDL.

[6]  Andrei Popescu-Belis,et al.  Building and Using a Corpus of Shallow Dialogue Annotated Meetings , 2004, LREC.

[7]  Andreas Girgensohn,et al.  Stained-glass visualization for highly condensed video summaries , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[8]  Mark Liberman,et al.  Transcriber: a free tool for segmenting, labeling and transcribing speech , 1998, LREC.

[9]  John R. Kender,et al.  Educational video understanding: mapping handwritten text to textbook chapters , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[10]  Denis Lalanne,et al.  Thematic segmentation of meetings through document/speech alignment , 2004, MULTIMEDIA '04.

[11]  Dalila Mekhaldi Multimodal Document Alignment: towards a Fully- indexed Multimedia Archive* , 2007 .

[12]  Andreas Girgensohn,et al.  Keyframe-Based User Interfaces for Digital Video , 2001, Computer.

[13]  Anoop Gupta,et al.  Distributed meetings: a meeting capture and broadcasting system , 2002, MULTIMEDIA '02.

[14]  James Allan,et al.  Text alignment with handwritten documents , 2004, First International Workshop on Document Image Analysis for Libraries, 2004. Proceedings..

[15]  Jean-Luc Bloechle,et al.  XCDF: A Canonical and Structured Document Format , 2006, Document Analysis Systems.

[16]  Edward A. Fox,et al.  Combining structural and citation-based evidence for text classification , 2004, CIKM '04.

[17]  Wolfgang Wahlster,et al.  Plan-Based Integration of Natural Language and Graphics Generation , 1993, Artif. Intell..

[18]  Gregory D. Abowd,et al.  Automated capture, integration, and visualization of multiple media streams , 1998, Proceedings. IEEE International Conference on Multimedia Computing and Systems (Cat. No.98TB100241).

[19]  Andrei Popescu-Belis,et al.  Reference Resolution over a Restricted Domain: References to Documents , 2004 .

[20]  Andrei Popescu-Belis,et al.  Automatic vs. human question answering over multimedia meeting recordings , 2009, INTERSPEECH.

[21]  Hagen Soltau,et al.  The ISL Meeting Room System , 2001 .

[22]  Darren Moore,et al.  The IDIAP Smart Meeting Room , 2002 .

[23]  Denis Lalanne,et al.  From searching to browsing through multimodal documents linking , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[24]  Richard J. Anderson,et al.  Speech, ink, and slides: the interaction of content channels , 2004, MULTIMEDIA '04.

[25]  James L. Flanagan,et al.  A Multimodal System for Accessing Driving Directions , 2002, Document Analysis Systems.

[26]  Brian Christopher Smith,et al.  Passive capture and structuring of lectures , 1999, MULTIMEDIA '99.

[27]  Rune Sætre,et al.  Semantic Annotation of Biomedical Literature Using Google , 2005, ICCSA.

[28]  Denis Lalanne,et al.  Multimodal Document Alignment : Feature-based Validation to Strengthen Thematic Links , 2010, J. Multim. Process. Technol..

[29]  Richard J. Anderson,et al.  A study of diagrammatic ink in lecture , 2005, Comput. Graph..

[30]  Elena Mugellini,et al.  SMAC: Smart Multimedia Archiving for Conferences , 2011, AWIC.

[31]  Flávio Bortolozzi,et al.  Segmentation and validation of commercial documents logical structure , 2000, Proceedings International Conference on Information Technology: Coding and Computing (Cat. No.PR00540).

[32]  Denis Lalanne,et al.  DocMIR: An automatic document-based indexing system for meeting retrieval , 2008, Multimedia Tools and Applications.

[33]  David Elsweiler,et al.  Towards memory supporting personal information management tools , 2007, J. Assoc. Inf. Sci. Technol..

[34]  Marti A. Hearst Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[35]  W. Bruce Croft,et al.  Text Segmentation by Topic , 1997, ECDL.

[36]  Andrei Popescu-Belis,et al.  Automatic content linking: speech-based just-in-time retrieval for multimedia archives , 2010, SIGIR '10.

[37]  Lynn Wilcox,et al.  Room with a Rear View: Meeting Capture in a Multimedia Conference Room , 2000, IEEE Multim..

[38]  Richard J. Anderson,et al.  Classroom Presenter: Enhancing Interactive Education with Digital Ink , 2007, Computer.