Speech recognition using randomized relational decision trees

We explore the possibility of recognizing speech signals using a large collection of coarse acoustic events, which describe temporal relations between a small number of local features of the spectrogram. The major issue of invariance to changes in duration of speech signal events is addressed by defining temporal relations in a rather coarse manner, allowing for a large degree of slack. The approach is greedy in that it does not offer an "explanation" of the entire signal as the hidden Markov models (HMMs) approach does; rather, it accesses small amounts of relational information to determine a speech unit or class. This implies that we recognize words as units, without recognizing their subcomponents. Multiple randomized decision trees are used to access the large pool of acoustic events in a systematic manner and are aggregated to produce the classifier.

[1]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[2]  Philip Lieberman,et al.  Speech Physiology, Speech Perception, and Acoustic Phonetics , 1988 .

[3]  Steve Young,et al.  The HTK book , 1995 .

[4]  R. Cole,et al.  TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .

[5]  Jiayu Li,et al.  A 2D extended HMM for speech recognition , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[6]  Alejandro Murua,et al.  Upper Bounds for Error Rates of Linear Combinations of Classifiers , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Alejandro Murua Upper Bounds for Error Rates Associated to Linear Combinations of Classifiers , 1999 .

[8]  E. Zwicker,et al.  Subdivision of the audible frequency range into critical bands , 1961 .

[9]  Yonghong Yan,et al.  Speech recognition using neural networks with forward-backward probability generated targets , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Yali Amit,et al.  Joint Induction of Shape Features and Tree Classifiers , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[11]  Partha Niyogi,et al.  Incorporating voice onset time to improve letter recognition accuracies , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[12]  Andrew K. C. Wong,et al.  Entropy and Distance of Random Graphs with Application to Structural Pattern Recognition , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Partha Niyogi,et al.  Distinctive feature detection using support vector machines , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[15]  Kristine W. Ma,et al.  Applying Large Vocabulary Hybrid HMM-MLP Methods to Telephone Recognition of Digits and Natural Numb , 1995 .

[16]  Alejandro Murua,et al.  Classification and clustering of stop consonants via nonparametric transformations and wavelets , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.