Relevant mRMR features for visual speech recognition

To improve the accuracy of visual speech recognition systems, forming a subset of relevant visual features, from a large set of extracted visual cues, is of fundamental importance. In this paper, two feature selection techniques, Principal Component Analysis (PCA) and a relatively recent method, Minimum Redundancy Maximum Relevance (mRMR), are separately applied on the extracted visual features. Prominent attributes are selected by each to form a feature vector for classification. Experimental results show that recognition accuracy for an isolated word database is not affected when a few selected mRMR features from the complete visual feature set are used for classification. This considerably reduces computation and storage overheads. It is also seen that features determined by mRMR perform better than PCA features. Both techniques yield inner mouth area segments as principal features as compared to other geometrical parameters.

[1]  He Jun,et al.  Research on Visual Speech Feature Extraction , 2009, 2009 International Conference on Computer Engineering and Technology.

[2]  Haihua Xu,et al.  A hybrid visual feature extraction method for audio-visual speech recognition , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[3]  Vijay Laxmi,et al.  Lipreading Using n-Gram Feature Vector , 2010, CISIS.

[4]  Vijay Laxmi,et al.  Speaker identification using optimal lip biometrics , 2012, 2012 5th IAPR International Conference on Biometrics (ICB).

[5]  Shou-Hsuan Stephen Huang,et al.  User Behavior Analysis in Masquerade Detection Using Principal Component Analysis , 2008, 2008 Eighth International Conference on Intelligent Systems Design and Applications.

[6]  B. Everitt,et al.  Applied Multivariate Data Analysis. , 1993 .

[7]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[8]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Hong-Goo Kang,et al.  Normalized minimum-redundancy and maximum-relevancy based feature selection for speaker verification systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  Q. Mcnemar Note on the sampling error of the difference between correlated proportions or percentages , 1947, Psychometrika.

[11]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[12]  Ian T. Jolliffe,et al.  Discarding Variables in a Principal Component Analysis. I: Artificial Data , 1972 .

[13]  Mark A. Clements,et al.  Visual speech feature extraction for improved speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  B. Everitt,et al.  Applied Multivariate Data Analysis: Everitt/Applied Multivariate Data Analysis , 2001 .

[15]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Juergen Luettin,et al.  A comparison of model and transform-based visual features for audio-visual LVCSR , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[17]  I. Jolliffe Discarding Variables in a Principal Component Analysis. Ii: Real Data , 1973 .