Towards Speaker Independent Features for Information Extraction from Meeting Audio Data
暂无分享,去创建一个
units. Word boundaries were not defined directly by the speech recognition transcript. Instead unit boundaries were estimated directly from the audio signal. The aim of early information extraction systems involved sentence segmentation, topic segmentation and named entity extraction from text using typographic cues, such as punctuation, to define the structure of the passage. With developments in speech recognition technology the research progressed to apply these tasks to spoken language. Initially these systems were similar to the text-based system, based purely on lexical information and consequently, disregarding the many cues available in the waveform. One set of acoustic cues that has been successfully introduced is prosodic information. A major challenge in systems that incorporate prosodic information is to find a set of features that is speaker independent. Absolute values of pitch and energy vary greatly with speaker and place so, to counteract this, such algorithms employ various normalization techniques. The aim of this study is to find simple acoustic features that are speaker independent within meeting conditions.
[1] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.
[2] Ralph Weischedel,et al. NAMED ENTITY EXTRACTION FROM SPEECH , 1998 .
[3] Ariel Salomon,et al. Detection of speech landmarks: use of temporal information. , 2004, The Journal of the Acoustical Society of America.
[4] Konstantinos Koumpis,et al. Automatic summarization of voicemail messages using lexical and prosodic features , 2005, TSLP.