An initial prototype system for Chinese spoken document understanding and organization for indexing/browsing and retrieval applications

The most attractive form of future network content will be multimedia. When voice information is included, it usually carries core concepts for the content. Thus, a spoken document associated with multimedia content can very possibly serve as the key for indexing/browsing and retrieval. However, unlike written documents, multimedia or voice information is very often just audio/video signals. They are very difficult to index, browse or retrieve, since users cannot go through each of them from the beginning to the end during browsing. A possible approach may be to segment the audio/video signals automatically into short paragraphs, each with a central concept or topic, and then automatically generate a title and/or a summary for each of these, in either speech or text form. The topics and central concepts described in the segmented short paragraphs may then be further analyzed and organized into graphic structures describing the relationships among these topics and central concepts. Hence, the multimedia content can be automatically indexed much more efficiently and browsed and retrieved by the user based on the title, summary and graphic structure. We refer to this as the understanding and organization of spoken documents. An initial prototype system for such functions, with broadcast news taken as the example multimedia content, is presented. The graphic structure used to describe the relationships among the topics and central concepts are 2-dimensional tree structures developed based on probabilistic latent semantic analysis.

[1]  Lin-Shan Lee,et al.  Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese , 2002, IEEE Trans. Speech Audio Process..

[2]  Lin-Shan Lee,et al.  Automatic title generation for Chinese spoken documents considering the special structure of the language , 2003, INTERSPEECH.

[3]  Sadaoki Furui,et al.  TWO-STAGE AUTOMATIC SPEECH SUMMARIZATION BY SENTENCE EXTRACTION AND COMPACTION , 2003 .

[4]  Lin-Shan Lee,et al.  Automatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach , 2003, INTERSPEECH.

[5]  Lei Zhang,et al.  Chinese Named Entity Identification Using Class-based Language Model , 2002, COLING.

[6]  Sadaoki Furui,et al.  Automatic speech summarization based on word significance and linguistic likelihood , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[7]  Thomas Hofmann,et al.  ProbMap - A probabilistic approach for mapping large document collections , 2000, Intell. Data Anal..

[8]  Susan T. Dumais,et al.  Landauer ? Indexing by Latent Semantic Analysis , 1990 .

[9]  Klaus Zechner,et al.  Automatic generation of concise summaries of spoken dialogues in unrestricted domains , 2001, SIGIR '01.

[10]  Larry Gillick,et al.  A hidden Markov model approach to text segmentation and event tracking , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[11]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[12]  Lee-Feng Chien,et al.  PAT-tree-based keyword extraction for Chinese information retrieval , 1997, SIGIR '97.

[13]  Lin-Shan Lee,et al.  Why is the special structure of the language important for Chinese spoken language processing? - examples on spoken document retrieval, segmentation and summarization , 2003, INTERSPEECH.

[14]  Warren R. Greiff,et al.  Fine-Grained Hidden Markov Modeling for Broadcast-News Story Segmentation , 2001, HLT.