Towards Speaker Independent Features for Information Extraction from Meeting Audio Data

units. Word boundaries were not defined directly by the speech recognition transcript. Instead unit boundaries were estimated directly from the audio signal. The aim of early information extraction systems involved sentence segmentation, topic segmentation and named entity extraction from text using typographic cues, such as punctuation, to define the structure of the passage. With developments in speech recognition technology the research progressed to apply these tasks to spoken language. Initially these systems were similar to the text-based system, based purely on lexical information and consequently, disregarding the many cues available in the waveform. One set of acoustic cues that has been successfully introduced is prosodic information. A major challenge in systems that incorporate prosodic information is to find a set of features that is speaker independent. Absolute values of pitch and energy vary greatly with speaker and place so, to counteract this, such algorithms employ various normalization techniques. The aim of this study is to find simple acoustic features that are speaker independent within meeting conditions.