论文信息 - A hidden Markov model approach to text segmentation and event tracking

A hidden Markov model approach to text segmentation and event tracking

Continuing progress in the automatic transcription of broadcast speech via speech recognition has raised the possibility of applying information retrieval techniques to the resulting (errorful) text. For these techniques to be easily applicable, it is highly desirable that the transcripts be segmented into stories. This paper introduces a general methodology based on HMMs and on classical language modeling techniques for automatically inferring story boundaries and for retrieving stories relating to a specific event. In this preliminary work, we report some highly promising results on accurate text. Future work will apply these techniques to errorful transcripts.

[1] Marti A. Hearst. Multi-Paragraph Segmentation Expository Text , 1994, ACL.

[2] Slava M. Katz,et al. Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[3] Rebecca J. Passonneau,et al. Combining Multiple Knowledge Sources for Discourse Segmentation , 1995, ACL.

[4] Hideki Kozima,et al. Text Segmentation Based on Similarity between Words , 1993, ACL.

[5] John D. Lafferty,et al. Text Segmentation Using Exponential Models , 1997, EMNLP.

[6] W. Bruce Croft,et al. Text Segmentation by Topic , 1997, ECDL.