Automatic Voicemail Summarisation for Mobile Messaging

One way to overcome the display and bandwidth limitations of today’s mobile environments is to reduce the amount of data transmitted to end-users by selecting only principal content, at the cost of introducing a tolerable information loss. This thesis provides a framework that allows systematic comparisons and integration of patterns present in voicemail messages for constructing summaries suitable for mobile messaging applications. In order to derive a text representation of voicemail messages we employ automatic speech recognition technology. IBM Voicemail Corpus-Part I is used in our experiments which represents a large vocabulary task over telephone lines with problems that include unknown channels, spontaneous and topic independent speech. We use a hybrid connectionist/HMM approach with a combination of front-ends and multi-style language modelling. This results into a compact system with competitive accuracy relative to that produced by more complex systems based on Gaussian mixture models. Voicemail summarisation differs from conventional text summarisation, since it does not assume a perfect transcription and is concerned with summarising brief spoken messages into terse summaries. We have adopted a word-extractive approach with each word in the transcribed message represented as a vector of features. The initial realisation of the summarisation component is based on lexical feature weighting given some summary length restrictions or compression rates. Frequent message terms are compacted and any terms classified as less informative are excluded from the summaries. The resulting summaries are then converted into a format suitable for transmission over narrowband wireless networks. The platform of choice is WAP Push over SMS that offers a proactive way to transmit data from servers to mobile devices without explicit user requests as well as easy and immediate access to particular voicemail messages. Machine learning methods are then used to investigate the extent to which lexical and prosodic features can be associated with content in voicemail messages. Prosodic features concern the way in which speech sounds are acoustically realised and the ones we extracted can be broadly grouped as referring to pitch, energy, word duration and pauses. One can identify many potentially relevant but also interrelated features for this task. We employ a feature selection approach in which we aim to use the data to guide us to an optimal subset of features. Instead of specifying a single classifier and feature set – optimised for a particular precision/recall trade-off – we maintain a set of classifiers/feature sets, optimising for all possible precision/recall trade-offs. We achieve this by considering the ROC curves of the trained classifiers (with respect to development data) and forming the convex hull of those ROC curves. The relative contribution of a large number of features and derived subsets is compared within two summarisation tasks, namely the binary decision and multi-class tasks. In the former, the goal is to classify words into those carrying principal content and those that do not, while in the latter the goal is to further classify the principal content words into proper names, telephone numbers, dates/times and other. A series of objective and subjective evaluations using unseen data is also presented. The objective evaluations show significant improvements over the baseline systems while the subjective evaluations show that users are able to determine the message priority and content fairly accurately. The perceived difference in quality of summarisation is affected more by errors resulting from automatic transcription, than by the automatic summarisation process. This suggests that the provision of accurate transcriptions is essential for successful speech summarisation applications. Finally, an evaluation framework is proposed with the aim to determine which metrics maximise summary quality and minimise delivery costs by combining user data and comparing system configurations that make different trade-offs.

[1]  Moshe Ben-Bassat,et al.  35 Use of distance measures, information measures and error bounds in feature evaluation , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[2]  Jerome R. Bellegarda,et al.  Tied mixture continuous parameter modeling for speech recognition , 1990, IEEE Trans. Acoust. Speech Signal Process..

[3]  R M Centor,et al.  An Evaluation of Methods for Estimating the Area Under the Receiver Operating Characteristic (ROC) Curve , 1985, Medical decision making : an international journal of the Society for Medical Decision Making.

[4]  G. Ayers Discourse functions of pitch range in spontaneous and read speech , 1994 .

[5]  Vibhu O. Mittal,et al.  Ultra-Summarization: A Statistical Approach to Generating Highly Condensed Non-Extractive Summaries (poster abstract). , 1998, SIGIR 1999.

[6]  Bhuvana Ramabhadran,et al.  Speech recognition performance on a voicemail transcription task , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[7]  Jacqueline Vaissière,et al.  Language-Independent Prosodic Features , 1983 .

[8]  Eric Fosler-Lussier,et al.  Speech recognition using on-line estimation of speaking rate , 1997, EUROSPEECH.

[9]  Y. Kato Voice message summary for voice services , 1994, Proceedings of ICSIPNN '94. International Conference on Speech, Image Processing and Neural Networks.

[10]  B. Atal Automatic Speaker Recognition Based on Pitch Contours , 1969 .

[11]  Elisabeth Schriberg,et al.  Preliminaries to a Theory of Speech Disfluencies , 1994 .

[12]  Vibhu O. Mittal,et al.  OCELOT: a system for summarizing Web pages , 2000, SIGIR '00.

[13]  Helen F. Hastie,et al.  Automatically predicting dialogue structure using prosodic features , 2002, Speech Commun..

[14]  Aaron E. Rosenberg,et al.  A comparative performance study of several pitch detection algorithms , 1976 .

[15]  Ralph Weischedel,et al.  NAMED ENTITY EXTRACTION FROM SPEECH , 1998 .

[16]  Peter Regel-Brietzmann,et al.  Improved modeling of OOV words in spontaneous speech , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Ian H. Witten,et al.  The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression , 1991, IEEE Trans. Inf. Theory.

[18]  Gökhan Tür,et al.  Automatic detection of sentence boundaries and disfluencies based on recognized words , 1998, ICSLP.

[19]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[20]  Frank K. Soong,et al.  An N-best candidates-based discriminative training for speech recognition applications , 1994, IEEE Trans. Speech Audio Process..

[21]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[22]  Anil K. Jain,et al.  Algorithms for feature selection: An evaluation , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[23]  Lisa J. Stifelman A Discourse Analysis Approach to Structured Speech , 1995 .

[24]  Yoav Freund,et al.  Boosting a weak learning algorithm by majority , 1995, COLT '90.

[25]  Dragomir R. Radev,et al.  Generating summaries of multiple news articles , 1995, SIGIR '95.

[26]  Simon Corston-Oliver,et al.  Text compaction for display on very small screens , 2001 .

[27]  C H Nakatani,et al.  A corpus-based study of repair cues in spontaneous speech. , 1994, The Journal of the Acoustical Society of America.

[28]  Konstantinos Koumpis,et al.  Transcription and summarization of voicemail speech , 2000, INTERSPEECH.

[29]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[30]  Chris D. Paice,et al.  Constructing literature abstracts by computer: Techniques and prospects , 1990, Inf. Process. Manag..

[31]  Daniel Jurafsky,et al.  Building multiple pronunciation models for novel words using exploratory computational phonology , 1995, EUROSPEECH.

[32]  Mahesan Niranjan,et al.  Parcel: Feature Subset Selection in Variable Cost Domains , 1998 .

[33]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[34]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[35]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[36]  Eric Keller,et al.  Prosodic aspects of speech , 1995 .

[37]  Guillaume Peersman,et al.  The Global System for Mobile Communications Short Message Service , 2000, IEEE Personal Communications.

[38]  Tom Fawcett,et al.  Using rule sets to maximize ROC performance , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[39]  R. Rosenfeld,et al.  Two decades of statistical language modeling: where do we go from here? , 2000, Proceedings of the IEEE.

[40]  Inderjeet Mani,et al.  Machine Learning of Generic and User-Focused Summarization , 1998, AAAI/IAAI.

[41]  Alexander H. Waibel,et al.  Dictionary learning for spontaneous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[42]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[43]  Thierry Dutoit,et al.  The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[44]  Gökhan Tür,et al.  Combining words and prosody for information extraction from speech , 1999, EUROSPEECH.

[45]  John Bear,et al.  Integrating Multiple Knowledge Sources for Detection and Correction of Repairs in Human-Computer Dialog , 1992, ACL.

[46]  J. Gandour,et al.  The Perception of Tone , 1978 .

[47]  Radford M. Neal Assessing Relevance determination methods using DELVE , 1998 .

[48]  David J. C. MacKay,et al.  A Practical Bayesian Framework for Backpropagation Networks , 1992, Neural Computation.

[49]  B. Atal Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification. , 1974, The Journal of the Acoustical Society of America.

[50]  Mari Ostendorf,et al.  Prosody and Parsing , 1989, HLT.

[51]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[52]  Eric Fosler-Lussier,et al.  Towards robustness to fast speech in ASR , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[53]  Alexander H. Waibel,et al.  Speaking mode dependent pronunciation modeling in large vocabulary conversational speech recognition , 1997, EUROSPEECH.

[54]  Wendy G. Lehnert,et al.  Plot Units and Narrative Summarization , 1981, Cogn. Sci..

[55]  Victoria A. Fromkin A Note on the Suprasegmental Representation of Prosody , 1987 .

[56]  K. Koumpis,et al.  Performance evaluation of SMS-based email and voicemail notification architecture , 1999 .

[57]  Elmar Nöth,et al.  Automatic classification of prosodically marked phrase boundaries in German , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[58]  Pat Langley,et al.  Induction of Selective Bayesian Classifiers , 1994, UAI.

[59]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[60]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[61]  Marcus L. Fach A comparison between syntactic and prosodic phrasing , 1999, EUROSPEECH.

[62]  John S. Bridle,et al.  Training Stochastic Model Recognition Algorithms as Networks can Lead to Maximum Mutual Information Estimation of Parameters , 1989, NIPS.

[63]  Damaris M. Ayuso,et al.  Gisting conversational speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[64]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[65]  Nelson Morgan,et al.  Dynamic pronunciation models for automatic speech recognition , 1999 .

[66]  Kathleen R. McKeown,et al.  Summarization Evaluation Methods: Experiments and Analysis , 1998 .

[67]  Y. Sagisaka,et al.  Speech synthesis from text , 1990, IEEE Communications Magazine.

[68]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[69]  Kim E. A. Silverman,et al.  Vocal cues to speaker affect: testing two models , 1984 .

[70]  Lisa F. Rau,et al.  Information extraction and text summarization using linguistic knowledge acquisition , 1989, Inf. Process. Manag..

[71]  Mitch Weintraub,et al.  Neural-network based measures of confidence for word recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[72]  R. Collier,et al.  Declination: Construct or Intrinsic Feature of Speech Pitch ? , 1982, Phonetica.

[73]  Steven Greenberg,et al.  Robust speech recognition using the modulation spectrogram , 1998, Speech Commun..

[74]  Sadaoki Furui,et al.  Improvements in automatic speech summarization and evaluation methods , 2000, INTERSPEECH.

[75]  Alon Lavie,et al.  Janus-III: speech-to-speech translation in multiple languages , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[76]  Aaron E. Rosenberg,et al.  SCANMail: browsing and searching speech data by content , 2001, INTERSPEECH.

[77]  Francine R. Chen,et al.  The use of emphasis to automatically summarize a spoken discourse , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[78]  Gwyneth Doherty-Sneddon,et al.  The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[79]  Lou Boves,et al.  Acoustic characteristics of lexical stress in continuous telephone speech , 1999, Speech Commun..

[80]  Alan W. Black,et al.  Issues in building general letter to sound rules , 1998, SSW.

[81]  W. Nick Campbell Normalised segment durations in a syllable frame , 1990, SSW.

[82]  Hermann Ney,et al.  On structuring probabilistic dependences in stochastic language modelling , 1994, Comput. Speech Lang..

[83]  Gregory Grefenstette Producing Intelligent Telegraphic Text Reduction to provide an Audio Scanning Service for the Blind , 1998 .

[84]  Yoshihiko Gotoh,et al.  Sentence Boundary Detection in Broadcast Speech Transcripts , 2000 .

[85]  John J. Godfrey,et al.  SWITCHBOARD: telephone speech corpus for research and development , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[86]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[87]  Yoichi Yamashita,et al.  Extraction of important sentences using F0 information for speech summarization , 2002, INTERSPEECH.

[88]  Gary D. Cook,et al.  Time-first search for speech recognition , 2000 .

[89]  Elmar Nöth,et al.  VERBMOBIL: the use of prosody in the linguistic components of a speech understanding system , 2000, IEEE Trans. Speech Audio Process..

[90]  Chris D. Paice,et al.  The automatic generation of literature abstracts: an approach based on the identification of self-indicating phrases , 1980, SIGIR '80.

[91]  Hervé Bourlard,et al.  Connectionist probability estimators in HMM speech recognition , 1994, IEEE Trans. Speech Audio Process..

[92]  George Saon,et al.  Data-driven approach to designing compound words for continuous speech recognition , 2001, IEEE Trans. Speech Audio Process..

[93]  Richard M. Stern,et al.  On the effects of speech rate in large vocabulary speech recognition systems , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[94]  Karen Spärck Jones,et al.  TREC-6 1997 Spoken Document Retrieval Track Overview and Results , 1997, TREC.

[95]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[96]  Lynette Hirschman,et al.  EVALUATING CONTENT EXTRACTION FROM AUDIO SOURCES , 1999 .

[97]  Kuldip K. Paliwal,et al.  Automatic Speech and Speaker Recognition , 1996 .

[98]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[99]  K. Sparck Jones,et al.  Simple, proven approaches to text retrieval , 1994 .

[100]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[101]  Igor Kononenko,et al.  Estimating Attributes: Analysis and Extensions of RELIEF , 1994, ECML.

[102]  R. Schvaneveldt,et al.  Facilitation in recognizing pairs of words: evidence of a dependence between retrieval operations. , 1971, Journal of experimental psychology.

[103]  Jonathan G. Fiscus,et al.  A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER) , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[104]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[105]  Lori Lamel,et al.  On designing pronunciation lexicons for large vocabulary continuous speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[106]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[107]  Robin J. Lickley,et al.  Detecting disfluency in spontaneous speech , 1994 .

[108]  Robert J. Gaizauskas,et al.  Using Coreference Chains for Text Summarization , 1999, COREF@ACL.

[109]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[110]  Inderjeet Mani,et al.  The Challenges of Automatic Summarization , 2000, Computer.

[111]  H. Jiang Reliability, costs and delay performance of sending short message service in wireless systems , 1998, ICUPC '98. IEEE 1998 International Conference on Universal Personal Communications. Conference Proceedings (Cat. No.98TH8384).

[112]  Robert M. Colomb,et al.  WAP enabling existing HTML applications , 2000, Proceedings First Australasian User Interface Conference. AUIC 2000 (Cat. No.PR00515).

[113]  Alexander H. Waibel,et al.  DIASUMM: Flexible Summarization of Spontaneous Dialogues in Unrestricted Domains , 2000, COLING.

[114]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[115]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[116]  Alexander G. Hauptmann,et al.  Informedia: news-on-demand multimedia information acquisition and retrieval , 1997 .

[117]  M. Aretoulaki Towards a Hybrid Abstract Generation System 3 the Need for a Hybrid System 3.1 Previous Work on Connectionist Nlp Symbolic Ann-based Content Selector Encoder List of Important Sentences Morphological Analyser Syntactic Analyser Lexicon Semantic Analyser Pragmatic Analyser , 1997 .

[118]  Julia Hirschberg,et al.  SCAN: designing and evaluating user interfaces to support retrieval from speech archives , 1999, SIGIR '99.

[119]  J. Haton Knowledge-based and expert systems in automatic speech recognition , 1987 .

[120]  Konstantinos Koumpis,et al.  Extractive summarization of voicemail using lexical and prosodic feature subset selection , 2001, INTERSPEECH.

[121]  Steve Renals,et al.  Confidence measures from local posterior probability estimates , 1999, Comput. Speech Lang..

[122]  Robin Valenza SUMMARISATION OF SPOKEN AUDIO THROUGH INFORMATION EXTRACTION , 1999 .

[123]  F. Jelinek,et al.  Perplexity—a measure of the difficulty of speech recognition tasks , 1977 .

[124]  Rosaria Silipo,et al.  AUTOMATIC TRANSCRIPTION OF PROSODIC STRESS FOR SPONTANEOUS ENGLISH DISCOURSE , 1999 .

[125]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[126]  Steve Renals,et al.  Information extraction from broadcast news , 2000, Philosophical Transactions of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[127]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[128]  Keikiehi Hirose Disambiguating Recognition Results by Prosodic Features , 1997, Computing Prosody.

[129]  Jade Goldstein-Stewart,et al.  Summarizing text documents: sentence selection and evaluation metrics , 1999, SIGIR '99.

[130]  Mitch Weintraub,et al.  Automatic Learning of Word Pronunciation from Data , 1996 .

[131]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[132]  D. Howard,et al.  Speech and audio signal processing: processing and perception of speech and music [Book Review] , 2000 .

[133]  Bhuvana Ramabhadran,et al.  TRANSCRIPTION OF NEW SPEAKING STYLES - VOICEMAIL , 1998 .

[134]  Marilyn A. Walker,et al.  Evaluating spoken dialogue agents with PARADISE: Two case studies , 1998, Comput. Speech Lang..

[135]  P Taylor,et al.  Intonation and dialogue context as constraints for speech recognition , 1998 .

[136]  Mari Ostendorf,et al.  Robust information extraction from automatically generated speech transcriptions , 2000, Speech Commun..

[137]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[138]  Andreas Stolcke,et al.  Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[139]  Douglas D. O'Shaughnessy Timing patterns in fluent and disfluent spontaneous speech , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[140]  Frank Smadja,et al.  Retrieving Collocations from Text: Xtract , 1993, CL.

[141]  John Hart,et al.  A Perceptual Study of Intonation , 1990 .

[142]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[143]  Ann Cutler,et al.  Prosody in the Comprehension of Spoken Language: A Literature Review , 1997, Language and speech.

[144]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[145]  Julia Hirschberg A Corpus-Based Approach to the Study of Speaking Style , 2000 .

[146]  Paul Taylor,et al.  Assigning intonation elements and prosodic phrasing for English speech synthesis from high level linguistic input , 1994, ICSLP.

[147]  Larry P. Heck,et al.  Modeling dynamic prosodic variation for speaker verification , 1998, ICSLP.

[148]  Huan Liu,et al.  A Probabilistic Approach to Feature Selection - A Filter Solution , 1996, ICML.

[149]  Katarina Bartkova,et al.  Usefulness of phonetic parameters in a rejection procedure of an HMM-based speech recognition system , 1997, EUROSPEECH.

[150]  Brian Kingsbury,et al.  Spert-II: A Vector Microprocessor System , 1996, Computer.

[151]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[152]  Robert L. Mercer,et al.  Class-Based n-gram Models of Natural Language , 1992, CL.

[153]  G. Bruce Swedish word accents in sentence perspective , 1977 .

[154]  John Linn,et al.  A variable-rate CELP coder for fast remote voicemail retrieval using a notebook computer , 1997, Proceedings of First Signal Processing Society Workshop on Multimedia Signal Processing.

[155]  Sadaoki Furui,et al.  Automatic speech summarization applied to English broadcast news speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[156]  Stanley F. Chen,et al.  Evaluation Metrics For Language Models , 1998 .

[157]  Julia Hirschberg,et al.  Acoustic indicators of topic segmentation , 1998, ICSLP.

[158]  R. Silipo,et al.  Prosodic stress and topic detection in spoken sentences , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[159]  Marilyn A. Walker,et al.  Evaluation for Darpa Communicator Spoken Dialogue Systems , 2000, LREC.

[160]  Daniel P. W. Ellis,et al.  Connectionist speech recognition of Broadcast News , 2002, Speech Commun..

[161]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[162]  A. Prince,et al.  On stress and linguistic rhythm , 1977 .

[163]  Elizabeth D. Liddy,et al.  Advances in Automatic Text Summarization , 2001, Information Retrieval.

[164]  Douglas B. Paul An Efficient A* Stack Decoder Algorithm for Continuous Speech Recognition with a Stochastic Language Model , 1992, HLT.

[165]  Paul C. Bagshaw,et al.  Enhanced pitch tracking and the processing of F0 contours for computer aided intonation teaching , 1993, EUROSPEECH.

[166]  Lalit R. Bahl,et al.  Speech recognition with continuous-parameter hidden Markov models , 1987, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[167]  Julia Hirschberg,et al.  Prosodic cues to recognition errors , 1999 .

[168]  J. Pierrehumbert The phonology and phonetics of English intonation , 1987 .

[169]  Steven Greenberg,et al.  ON THE ORIGINS OF SPEECH INTELLIGIBILITY IN THE REAL WORLD , 1997 .

[170]  Karen Spärck Jones Automatic summarising: factors and directions , 1998, ArXiv.

[171]  S. D. Hansen,et al.  Hidden Markov models and neural networks for speech recognition , 1999 .

[172]  Enrico Bocchieri,et al.  Vector quantization for the efficient computation of continuous density likelihoods , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[173]  R. L. Trask,et al.  语音学和音系学词典 = A dictionary of phonetics and phonology , 1993 .

[174]  Anoop Gupta,et al.  Auto-summarization of audio-video presentations , 1999, MULTIMEDIA '99.

[175]  G Salton,et al.  Automatic Analysis, Theme Generation, and Summarization of Machine-Readable Texts , 1994, Science.

[176]  Nick Campbell,et al.  Timing in Speech: A Multi-Level Process , 2000 .

[177]  Wayne A. Lea,et al.  Prosodic Aids to Speech Recognition , 1972 .

[178]  Francine R. Chen Identification of contextual factors for pronunciation networks , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[179]  D. O'Shaughnessy,et al.  Recognition of hesitations in spontaneous speech , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[180]  T. K. Vintsyuk Speech discrimination by dynamic programming , 1968 .

[181]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[182]  Steve Renals,et al.  Start-synchronous search for large vocabulary continuous speech recognition , 1999, IEEE Trans. Speech Audio Process..

[183]  Gökhan Tür,et al.  Prosody-based automatic segmentation of speech into sentences and topics , 2000, Speech Commun..

[184]  Giovanni Guida,et al.  Evaluating Importance: A Step Towards Text Summarization , 1985, IJCAI.

[185]  C. Osgood,et al.  Hesitation Phenomena in Spontaneous English Speech , 1959 .

[186]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[187]  Ron Kohavi,et al.  The Utility of Feature Weighting in Nearest-Neighbor Algorithms , 1997 .

[188]  Andreas Stolcke,et al.  Automatic punctuation and disfluency detection in multi-party meetings using prosodic and lexical cues , 2002, INTERSPEECH.

[189]  C. Fowler,et al.  Talkers' signaling of new and old. words in speech and listeners' perception and use of the distinction , 1987 .

[190]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[191]  N. Deshmukh,et al.  Hierarchical search for large-vocabulary conversational speech recognition: working toward a solution to the decoding problem , 1999 .

[192]  Mark T. Maybury,et al.  Broadcast news navigation using story segmentation , 1997, MULTIMEDIA '97.

[193]  Ciro Martins,et al.  Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system , 1995, EUROSPEECH.

[194]  E. F. Skorochod'ko Adaptive Method of Automatic Abstracting and Indexing , 1971, IFIP Congress.

[195]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[196]  Mark R. Wade,et al.  Construction and Assessment of Classification Rules , 1999, Technometrics.

[197]  Hermann Ney,et al.  Dynamic programming search for continuous speech recognition , 1999, IEEE Signal Process. Mag..

[198]  Dictionary of phonetics and phonology , 1998 .

[199]  Hideki Kawahara,et al.  Comparative evaluation of F estimation algorithms , 2001 .

[200]  Eyal Yair,et al.  Super resolution pitch determination of speech signals , 1991, IEEE Trans. Signal Process..

[201]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[202]  Michael Riley,et al.  A statistical model for generating pronunciation networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[203]  Mark Stefik,et al.  Introduction to knowledge systems , 1995 .

[204]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[205]  Tony Robinson,et al.  Time-first search for large vocabulary speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[206]  Megumi Kameyama,et al.  Coping with aboutness complexity in information extraction from spoken dialogues , 1994, ICSLP.

[207]  Thomas G. Dietterich,et al.  Learning with Many Irrelevant Features , 1991, AAAI.

[208]  Ronald A. Cole,et al.  Automatically generated word pronunciations from phoneme classifier output , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[209]  Barry Arons,et al.  SpeechSkimmer: a system for interactively skimming recorded speech , 1997, TCHI.

[210]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[211]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[212]  Konstantinos Koumpis,et al.  The Role of Prosody in a Voicemail Summarization System , 2001 .

[213]  Harold Goodglass,et al.  The Role of Prosody in the Mental Lexicon , 1999, Brain and Language.

[214]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[215]  Elmar Nöth,et al.  Integrated dialog act segmentation and classification using prosodic features and language models , 1997, EUROSPEECH.

[216]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[217]  Steve Renals,et al.  Recent improvements to the ABBOT large vocabulary CSR system , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[218]  Phyllis B. Baxendale,et al.  Machine-Made Index for Technical Literature - An Experiment , 1958, IBM J. Res. Dev..

[219]  Bart Selman,et al.  Stochastic Search and Phase Transitions: AI Meets Physics , 1995, IJCAI.

[220]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[221]  James F. Allen,et al.  Intonational Boundaries, Speech Repairs, and Discourse Markers: Modeling Spoken Dialog , 1997, ACL.

[222]  Francine Chen,et al.  A trainable document summarizer , 1995, SIGIR '95.

[223]  M. Beckman Stress And Non-Stress Accent , 1986 .

[224]  L. A. Smith,et al.  Feature Subset Selection: A Correlation Based Filter Approach , 1997, ICONIP.

[225]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[226]  Elmar Nöth,et al.  Prosody, empty categories and parsing - a success story , 1996, ICSLP.

[227]  Mark Stevenson,et al.  Using Corpus-derived Name Lists for Named Entity Recognition , 2000, ANLP.

[228]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[229]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[230]  Andreas Paepcke,et al.  Seeing the whole in parts: text summarization for web browsing on handheld devices , 2001, WWW '01.

[231]  Steven Greenberg,et al.  PROSODIC STRESS REVISITED: REASSESSING THE ROLE OF FUNDAMENTAL FREQUENCY , 2000 .

[232]  Brian Everitt,et al.  Principles of Multivariate Analysis , 2001 .