Acoustic modeling for spontaneous speech recognition using syllable dependent models
暂无分享,去创建一个
This paperproposesa syllablecontext dependent model for spontaneousspeechrecognition. It is generally assumedthat,sincespontaneouspeechis greatlyaffectedby coarticulation,an acousticmodel featuringa longerrange phonemiccontext is requiredto achieve a high degreeof recognitionaccuracy. This motivated the authorsto investigatea tri-syllable model that takesdifferencesin the precedingand succeedingsyllables into account. Since Japanesesyllablesconsistof eitherasinglevowel or aconsonantandvowel combination,a tri-syllablemodelalways takestheprecedingandsucceedingvowelsthataretheprimary factorsin coarticulationinto account. A tri-syllable model is thuscapableof efficiently representingcoarticulation. The tri-syllable model was trainedusing spontaneousspeech;then, the effectivenessof continuoussyllable recognitionand statisticallanguagemodel-basedcontinuousword recognitionwereevaluated. Comparedto a regular triphonemodelwithout statesharing,it wasfound that the correctsyllableaccuracy of the continuoussyllable recognitionimprovedfrom 64.9%to 66.3%.Theword recognition accuracy for the statistical languagemodelbasedcontinuousword recognitionimproved from 88.4% to 89.2%.
[1] Takeshi Matsumura,et al. Non-uniform unit based HMMs for continuous speech recognition , 1995, Speech Commun..