Genetic Programming for detecting rhythmic stress in spoken English

Rhythmic stress detection is an important but difficult problem in speech recognition. This paper describes an approach to the automatic detection of rhythmic stress in New Zealand spoken English using a linear genetic programming system with speaker independent prosodic features and vowel quality features as terminals to classify each vowel segment as stressed or unstressed. In addition to the four standard arithmetic operators, this approach also uses other functions such as trigonometric and conditional functions in the function set to cope with the complexity of the task. The error rate on the training set is used as the fitness function. The approach is examined and compared to a decision tree approach and a support vector machine approach on a speech data set with 703 vowels segmented from 60 female adult utterances. The genetic programming approach achieved a maximum average accuracy of 92.6%. The results suggest that the genetic programming approach developed in this paper outperforms the decision tree approach and the support vector machine approach for stress detection on this data set in terms of the detection accuracy, the ability of handling redundant features, and the automatic feature selection capability.

[1]  J. Bernthal,et al.  Articulation and Phonological Disorders , 1988 .

[2]  P. Ladefoged A course in phonetics , 1975 .

[3]  Alex Waibel,et al.  Recognition of lexical stress in a continuous speech understanding system - A pattern recognition approach , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Peter Nordin,et al.  Genetic programming - An Introduction: On the Automatic Evolution of Computer Programs and Its Applications , 1998 .

[5]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[6]  Ian Maddieson,et al.  Vowels of the world''''s languages , 1990 .

[7]  P. Ladefoged Three areas of experimental phonetics , 1967 .

[8]  Peter Nordin,et al.  Speech Sound Discrimination with Genetic Programming , 1998, EuroGP.

[9]  A. M. Aull,et al.  Lexical stress determination and its application to large vocabulary speech recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Frank Fallside,et al.  Lexical stress estimation and phonological knowledge , 1990 .

[11]  Michael S. Scordilis,et al.  Development and comparison of three syllable stress classifiers , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[12]  Mengjie Zhang,et al.  Learning Models for English Speech Recognition , 2004, ACSC.

[13]  John H. Holland,et al.  Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence , 1992 .

[14]  John R. Koza,et al.  Genetic programming 2 - automatic discovery of reusable programs , 1994, Complex Adaptive Systems.

[15]  P. Lieberman Some Acoustic Correlates of Word Stress in American English , 1959 .

[16]  Ruxin Chen,et al.  Lexical stress detection on stress-minimal word pairs , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[17]  Mengjie Zhang,et al.  Detecting Stress in Spoken English using Decision Trees and Support Vector Machines , 2004, ACSW.

[18]  Zbigniew Michalewicz,et al.  Genetic algorithms + data structures = evolution programs (3rd ed.) , 1996 .

[19]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[20]  Colin W. Wightman Automatic detection of prosodic constituents for parsing , 1992 .

[21]  Lou Boves,et al.  Acoustic characteristics of lexical stress in continuous speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .