Adapting to the Speaker in Automatic Speech Recognition
暂无分享,去创建一个
Abstract Many automatic speech recognisers work on the principle of matching incoming utterances to a library of stored voice templates. There are two main shortcomings of this approach, which can potentially be overcome by careful interface design. Firstly, the templates, collected under strictly controlled conditions, are not necessarily representative of the speaker's normal voice. Secondly, although the speaker's voice is likely to alter during the course of using the speech recogniser, the templates representing that voice will remain unchanged. This will result in a gradual lessening of the similarity of template and utterance. In the context of an information-retrieval task using fully automatic speech recognition, attempts were made to overcome the above problems. It was found that a modified means of template formation, giving rise to more representative templates, could improve recognition figures, especially for female speakers. However, attempts at constantly updating the templates in accordance with drifts in the speaker's diction were ineffectual in this instance. This latter result conflicts with the results of earlier, comparable studies.
[1] Donald W. Connolly. Voice data entry in air traffic control , 1979 .
[2] Robert I. Damper. Voice-Input Aids for the Physically Disabled , 1984, Int. J. Man Mach. Stud..
[3] T. R. G. Green,et al. Friendly interfacing to simple speech recognizers , 1983 .
[4] Robert I. Damper,et al. Template adaptation in speech recognition , 1984 .
[5] M. J. Underwood. What the Engineers would Like to Know from the Psychologists , 1980 .