Articulatory Manner Features Recognition with Linear and Polynomial Kernels

A typical speech recognition system uses acoustic features to represent speech for its processing. Recently, articulatory features were introduced to serve the same purpose. They are motivated by linguistic knowledge and may therefore provide better or complementary representation of speech signal. We present research on recognition of such articulatory features by Support Vector Machines with three types of kernels—a linear kernel and two polynomial kernels. As input for recognizers we use standard set of Mel-frequency cepstral coefficients extended with values of formants and pitch of the speech signal. Performance is compared to recent results for the task based on other methods of machine learning. We conclude that for most of the articulatory features SVMs with a polynomial kernel give superior performance. Razpoznavanje značilk artikulatornega načina z linearnimi in polinomskimi jedri Tipičen sistem razpoznavanja govora uporablja pri procesiranju za predstavitev govora akustične značilke. V zadnjem času so se z istim namenom začele uporabljati tudi artikulatorne značilke. Uporabo leteh je motiviralo jezikoslovno znanje, zato lahko morda omogočajo boljšo ali komplementarno predstavitev govornega signala. V prispevku predstavljamo raziskavo o tem, kako z metodo podpornih vektorjev (MPV) razpoznavamo artikulatorne značilke s tremi vrstami jeder z linearnim jedrom in z dvema polinomskima jedroma. Kot vhodne podatke za razpoznavalnike uporabljamo standardno množico melodičnih frekvennih kepstralnih koeficientov, razširjenih z vrednostmi formantov in osnovnih period govornega signala. Kakovost izvedbe primerjamo z nedavnimi rezultati za isto nalogo na podlagi drugih metod strojnega učenja. Sklenemo z ugotovitvijo, da dajo za večino artikulatornih značilk polinomske MPV najboljše rezultate.

[1]  Chin-Hui Lee,et al.  Towards knowledge-based features for HMM based large vocabulary automatic speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Katrin Kirchhoff,et al.  Robust speech recognition using articulatory information , 1998 .

[3]  M. Halle,et al.  Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates , 1961 .

[4]  K. Stevens Acoustic correlates of some phonetic categories. , 1979, The Journal of the Acoustical Society of America.

[5]  J. Ross Quinlan,et al.  Simplifying decision trees , 1987, Int. J. Hum. Comput. Stud..

[6]  Supphanat Kanokphara,et al.  Articulatory-acoustic Feature Recognition: Comparison of Machine Learning and HMM methods , 2005 .

[7]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[8]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[9]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[10]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[11]  Supphanat Kanokphara,et al.  Comparative Study: HMM and SVM for Automatic Articulatory Feature Extraction , 2006, IEA/AIE.

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  Tanja Schultz,et al.  Multilingual articulatory features , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[14]  Steven Greenberg,et al.  An elitist approach to automatic articulatory-acoustic feature classification for phonetic characterization of spoken language , 2005, Speech Commun..

[15]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .