Tree distributions approximation model for robust discrete speech recognition

This paper proposes a new discrete speech recognition method which investigates the capability of graphical models based on tree distributions that are widely used in many optimization areas. A novel spanning tree structure that utilizes the temporal nature of speech signal is proposed. The proposed tree structure significantly reduces complexity in so far that can reflect simply a few essential relationships rather than all possible structures of trees. The application of this model is illustrated with different isolated word databases. Experimentally it has been shown that, the proposed approaches compared to the conventional discrete hidden Markov model (DHMM) yield reduced error rates of 2.54 %–12 % and improve recognition speed minimum 3-fold. In addition, an impressive gain in learning time is observed. The overall recognition accuracy was 93.09 %–95.34 %, thereby confirming the effectiveness of the proposed methods.

[1]  R. Gray,et al.  Vector quantization , 1984, IEEE ASSP Magazine.

[2]  Alfred O. Hero,et al.  Covariance Estimation in Decomposable Gaussian Graphical Models , 2009, IEEE Transactions on Signal Processing.

[3]  C. N. Liu,et al.  Approximating discrete probability distributions with dependence trees , 1968, IEEE Trans. Inf. Theory.

[4]  Edwin R. Hancock,et al.  Learning shape-classes using a mixture of tree-unions , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  L. Rabiner,et al.  The acoustics, speech, and signal processing society - A historical perspective , 1984, IEEE ASSP Magazine.

[6]  David A. Forsyth,et al.  Mixtures of trees for object recognition , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Steve Renals,et al.  Hierarchical Bayesian Language Models for Conversational Speech Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[9]  J.A. Bilmes,et al.  Graphical model architectures for speech recognition , 2005, IEEE Signal Processing Magazine.

[10]  Driss Aboutajdine,et al.  The mixture of K-Optimal-Spanning-Trees based probability approximation: Application to skin detection , 2008, Image Vis. Comput..

[11]  Marina Meila,et al.  An Accelerated Chow and Liu Algorithm: Fitting Tree Distributions to High-Dimensional Sparse Data , 1999, ICML.

[12]  M. Bedda,et al.  HMM parameters estimation based on cross-validation for Spoken Arabic Digits recognition , 2011, 2011 International Conference on Communications, Computing and Control Applications (CCCA).

[13]  Lang Tong,et al.  A Large-Deviation Analysis of the Maximum-Likelihood Learning of Markov Tree Structures , 2009, IEEE Transactions on Information Theory.

[14]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[15]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[16]  Vincent Y. F. Tan,et al.  Learning Gaussian Tree Models: Analysis of Error Exponents and Extremal Structures , 2009, IEEE Transactions on Signal Processing.

[17]  Nacereddine Hammami,et al.  Tree distribution classifier for automatic spoken Arabic digit recognition , 2009, 2009 International Conference for Internet Technology and Secured Transactions, (ICITST).

[18]  Eduardo Lleida,et al.  Bayesian Networks for Discrete Observation Distributions in Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.