Spoken document understanding and organization

Spoken documents (or associated multimedia content) are in fact better understood and reorganized in a way that retrieval/browsing can be performed easily. For example, they are now in the form of short paragraphs, properly organized in some hierarchical visual presentation with titles/summaries/topic labels as references for retrieval and browsing. The retrieval can be performed based on the full content, the summaries/titles/topic labels, or both. In this article, this is referred to as spoken document understanding and organization for efficient retrieval/browsing applications. The purpose of this article is to present a concise, comprehensive, and integrated overview of related areas in a unified context of spoken document understanding and organization for efficient retrieval/browsing applications. In addition, we present an initial prototype system we developed at National Taiwan University as a new example of integrating the various technologies and functionalities.

[1]  Mark T. Maybury,et al.  Advances in Automatic Text Summarization , 1999 .

[2]  Daben Liu,et al.  Speech and language technologies for audio indexing and retrieval , 2000, Proceedings of the IEEE.

[3]  Tatsuya Kawahara,et al.  Automatic indexing of lecture presentations using unsupervised learning of presumed discourse markers , 2004, IEEE Transactions on Speech and Audio Processing.

[4]  Lin-Shan Lee,et al.  Improved spoken document retrieval by exploring extra acoustic and linguistic cues , 2001, INTERSPEECH.

[5]  Lin-Shan Lee,et al.  Discriminating capabilities of syllable-based features and approaches of utilizing them for voice retrieval of speech information in Mandarin Chinese , 2002, IEEE Trans. Speech Audio Process..

[6]  Xin Liu,et al.  Generic text summarization using relevance measure and latent semantic analysis , 2001, SIGIR '01.

[7]  Berlin Chen,et al.  Exploring the use of latent topical information for statistical Chinese spoken document retrieval , 2006, Pattern Recognit. Lett..

[8]  Mari Ostendorf,et al.  Robust information extraction from spoken language data , 1999, EUROSPEECH.

[9]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.

[10]  Sadaoki Furui,et al.  Speech-to-text and speech-to-speech summarization of spontaneous speech , 2004, IEEE Transactions on Speech and Audio Processing.

[11]  Katunobu Itou,et al.  A Method for Open-Vocabulary Speech-Driven Text Retrieval , 2002, EMNLP.

[12]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[13]  Warren R. Greiff,et al.  Fine-Grained Hidden Markov Modeling for Broadcast-News Story Segmentation , 2001, HLT.

[14]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[15]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[16]  Amit Singhal,et al.  Document expansion for speech retrieval , 1999, SIGIR '99.

[17]  Sadaoki Furui,et al.  TWO-STAGE AUTOMATIC SPEECH SUMMARIZATION BY SENTENCE EXTRACTION AND COMPACTION , 2003 .

[18]  Ye-Yi Wang,et al.  Spoken language understanding , 2005, IEEE Signal Processing Magazine.

[19]  T. Kalker,et al.  IEEE Signal Processing Magazine Vol. 17 , 2000 .

[20]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[21]  S. Furui Recent Advances in Spontaneous Speech Recognition and Understanding , 2003 .

[22]  Richard M. Schwartz,et al.  A hidden Markov model information retrieval system , 1999, SIGIR '99.

[23]  Richard M. Schwartz,et al.  An Algorithm that Learns What's in a Name , 1999, Machine Learning.

[24]  Dragutin Petkovic,et al.  Phonetic confusion matrix based spoken document retrieval , 2000, SIGIR '00.

[25]  Kenney Ng,et al.  Subword-based approaches for spoken document retrieval , 2000, Speech Commun..

[26]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[27]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[28]  Lin-Shan Lee,et al.  Automatic title generation for Chinese spoken documents considering the special structure of the language , 2003, INTERSPEECH.

[29]  Yu Shi,et al.  A system for spoken query information retrieval on mobile devices , 2002, IEEE Trans. Speech Audio Process..

[30]  Berlin Chen,et al.  Lightly supervised and data-driven approaches to Mandarin broadcast news transcription , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[31]  D. Harman,et al.  Text Retrieval Conference and Message Understanding Conference , 2006 .

[32]  Marcello Federico,et al.  Bootstrapping Named Entity Recognition for Italian Broadcast News , 2002, EMNLP.

[33]  Mari Ostendorf,et al.  Modeling uncertainty for information extraction from speech data , 2001 .

[34]  Lin-Shan Lee,et al.  Why is the special structure of the language important for Chinese spoken language processing? - examples on spoken document retrieval, segmentation and summarization , 2003, INTERSPEECH.

[35]  Hermann Ney,et al.  Named entity extraction from Japanese broadcast news , 2003, INTERSPEECH.

[36]  Richard A. Harshman,et al.  Information retrieval using a singular value decomposition model of latent semantic structure , 1988, SIGIR '88.

[37]  Ralph Grishman,et al.  Message Understanding Conference- 6: A Brief History , 1996, COLING.

[38]  Vibhu O. Mittal,et al.  Ultra-summarization (poster abstract): a statistical approach to generating highly condensed non-extractive summaries , 1999, SIGIR '99.

[39]  S. Furui,et al.  Automatic recognition and understanding of spoken language - a first step toward natural human-machine communication , 2000, Proceedings of the IEEE.

[40]  J.R. Bellegarda,et al.  Latent semantic mapping [information retrieval] , 2005, IEEE Signal Processing Magazine.

[41]  David M. Blei,et al.  Topic segmentation with an aspect hidden Markov model , 2001, SIGIR '01.

[42]  Julia Hirschberg,et al.  SCAN: designing and evaluating user interfaces to support retrieval from speech archives , 1999, SIGIR '99.

[43]  Samuel Kaski,et al.  Self organization of a massive document collection , 2000, IEEE Trans. Neural Networks Learn. Syst..

[44]  Mikko Kurimo,et al.  Thematic indexing of spoken documents by using self-organizing maps , 2002, Speech Commun..

[45]  Thomas Hofmann,et al.  ProbMap - A probabilistic approach for mapping large document collections , 2000, Intell. Data Anal..

[46]  S. Renals,et al.  Content-based access to spoken audio , 2005, IEEE Signal Processing Magazine.

[47]  Lei Zhang,et al.  Chinese Named Entity Identification Using Class-based Language Model , 2002, COLING.

[48]  Teuvo Kohonen,et al.  Self-Organizing Maps , 2010 .

[49]  Rong Jin,et al.  Title generation for spoken broadcast news using a training corpus , 2000, INTERSPEECH.

[50]  Sadaoki Furui,et al.  Automatic speech summarization based on word significance and linguistic likelihood , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[51]  Douglas E. Appelt,et al.  SRI International FASTUS SystemMUC-6 Test Results and Analysis , 1995, MUC.

[52]  Hsin-Min Wang,et al.  Statistical Chinese spoken document retrieval using latent topical information , 2004, INTERSPEECH.

[53]  Lin-Shan Lee,et al.  A discriminative HMM/N-gram-based retrieval approach for mandarin spoken documents , 2004, TALIP.

[54]  Lin-Shan Lee,et al.  Automatic title generation for Chinese spoken documents using an adaptive k nearest-neighbor approach , 2003, INTERSPEECH.

[55]  Larry Gillick,et al.  A hidden Markov model approach to text segmentation and event tracking , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[56]  Ralph Grishman,et al.  A Maximum Entropy Approach to Named Entity Recognition , 1999 .

[57]  Bernt A. Bremdal,et al.  Information Extraction: State-of-the-Art Report , 2000 .

[58]  Douglas E. Appelt,et al.  Introduction to Information Extraction Technology , 1999, IJCAI 1999.

[59]  Jerome Rene Bellegarda,et al.  Latent Semantic Mapping , 2007 .

[60]  Lin-Shan Lee,et al.  Improved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features , 2002, INTERSPEECH.