Selection of Features for Multimodal Vocalic Segments Classification

English speech recognition experiments are presented employing both: audio signal and Facial Motion Capture (FMC) recordings. The principal aim of the study was to evaluate the influence of feature vector dimension reduction for the accuracy of vocalic segments classification employing neural networks. Several parameter reduction strategies were adopted, namely: Extremely Randomized Trees, Principal Component Analysis and Recursive Parameter Elimination. The feature extraction process is explained, applied feature selection methods are presented and obtained results are discussed.

[1]  Andrzej Czyzewski,et al.  Comparative Study of Self-Organizing Maps vs Subjective Evaluation of Quality of Allophone Pronunciation for Non-native English Speakers , 2017 .

[2]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[3]  Robert P. Sheridan,et al.  Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling , 2003, J. Chem. Inf. Comput. Sci..

[4]  Youxian Sun,et al.  Accelerated Recursive Feature Elimination Based on Support Vector Machine for Key Variable Identification , 2006 .

[5]  Andrzej Czyzewski,et al.  A comparative study of English viseme recognition methods and algorithms , 2017, Multimedia Tools and Applications.

[6]  Zheng Fang,et al.  Comparison of different implementations of MFCC , 2001 .

[7]  Hongbin Zha,et al.  Unsupervised Random Forest Manifold Alignment for Lipreading , 2013, 2013 IEEE International Conference on Computer Vision.

[8]  H. Abdi,et al.  Principal component analysis , 2010 .

[9]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[10]  Age K. Smilde,et al.  Principal Component Analysis , 2003, Encyclopedia of Machine Learning.

[11]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[12]  Deividas Eringis Modified Filterbank Analysis Features for Speech Recognition , 2015 .

[13]  Stephen J. Cox,et al.  The challenge of multispeaker lip-reading , 2008, AVSP.