A unified structure-based framework for indexing and gisting of meetings

A variety of media involve the spoken interaction of people. For this media to be useful, indexing and browsing facilities must be provided to the user. We present a unified framework for indexing and gisting spoken interactions of people. We use speaker identification, prosody analysis and word spotting as preprocessing steps to find the structure of the meeting. The structure is modeled using a stochastic approach based on the hidden Markov model. The result of the analysis is an outline or table of content, as well as a rich set of visual queues for navigating the media. In addition to the automatic analysis, we provide the user with tools for browsing the meeting, as well as tools for directing the analysis and editing the results. We present early results using the proposed framework.

[1]  H. Gish,et al.  Text-independent speaker identification , 1994, IEEE Signal Processing Magazine.

[2]  Francine Chen,et al.  Segmentation of speech using speaker identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Xrrox Pakc SEGMENTATION OF SPEECH USING SPEAKER IDENTIFICATION , 1994 .

[4]  Myoung-Wan Koo,et al.  A new hybrid decoding algorithm for speech recognition and utterance verification , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[5]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[6]  Shih-Ping Liou,et al.  A New Hybrid Approach to Video Organization for Content-Based Indexing , 1998, ICMCS.

[7]  Rick Kazman,et al.  Four Paradigms for Indexing Video Conferences , 1996, IEEE Multim..

[8]  Barry Arons,et al.  Pitch-based emphasis detection for segmenting speech recordings , 1994, ICSLP.

[9]  Francine R. Chen,et al.  The use of emphasis to automatically summarize a spoken discourse , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.