In acoustic modeling for speech recognition, the Gaussian distribution or the Gaussian mixture distribution is widely used. The general reason for preference of the Gaussian distribution in the parametric modeling of an unknown ensemble is the central limit theorem. The Gaussian distribution has many properties that are theoretically clear. For the particular problem, however, in which the time series of an acoustic feature is to be modeled on the basis of a limited number of training samples for speech recognition, there is no guarantee that the method based on the Gaussian distribution is always optimal. Consequently, this paper proposes an acoustic modeling approach based on the generalized Laplacian distribution, which can represent a wider range of distribution shapes, including the Laplacian and Gaussian distributions. The formulation of the generalized Laplacian distribution and the method of estimation of the distribution parameters are described. The acoustic model with the generalized Laplacian mixture output distribution is constructed by retraining of the hidden Markov model with the Gaussian mixture output distribution. It is shown by a continuous speech recognition experiment using natural uttered speech that the recognition performance is improved compared to recognition based on the Gaussian mixture distribution. © 2002 Wiley Periodicals, Inc. Electron Comm Jpn Pt 2, 85(11): 32–42, 2002; Published online in Wiley InterScience (www.interscience.wiley.com). DOI 10.1002/ecjb.10093
[1]
Atsushi Nakamura,et al.
Japanese speech databases for robust speech recognition
,
1996,
Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.
[2]
Yoshinori Sagisaka,et al.
Variable-order N-gram generation by word-class splitting and consecutive word grouping
,
1996,
1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.
[3]
Peder A. Olsen,et al.
Maximum likelihood estimates for exponential type density families
,
1999,
1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).
[4]
William H. Press,et al.
Numerical recipes in C
,
2002
.
[5]
Mari Ostendorf,et al.
HMM topology design using maximum likelihood successive state splitting
,
1997,
Comput. Speech Lang..
[6]
Geoffrey Zweig,et al.
Recent improvements in voicemail transcription
,
1999,
EUROSPEECH.