Information theoretic acoustic feature selection for acoustic-to-articulatory inversion

We use mutual information as the criterion to rank the Mel frequency cepstral coefficients (MFCCs) and their derivatives according to the information they provide about different articulatory features in acoustic-to-articulatory (AtoA) inversion. It is found that just a small subset of the coefficients encodes maximal information about articulatory features and interestingly, this subset is articulatory feature specific. We use these subsets of MFCCs(+derivatives) in AtoA inversion using Gaussian mixture model (GMM) mapping. Inversion experiments with articulatory data support the information theoretic finding that the subsets of MFCCs(+derivatives) as selected by feature ranking method are sufficient to achieve an inversion performance similar to that obtained by a conventional full set of MFCCs(+derivatives). This drastically reduces the modeling complexity of the acoustic-articulatory map using GMM without degrading inversion performance significantly. Index Terms: Acoustic-to-articulatory inversion, mutual information, Gaussian mixture model.

[1]  Jacob Benesty,et al.  Pearson Correlation Coefficient , 2009 .

[2]  Shrikanth Narayanan,et al.  A generalized smoothness criterion for acoustic-to-articulatory inversion. , 2010, The Journal of the Acoustical Society of America.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Louis Goldstein,et al.  Gestural specification using dynamically-defined articulatory structures , 1990 .

[5]  Louis Goldstein,et al.  Articulatory gestures as phonological units , 1989, Phonology.

[6]  Bohn Stafleu van Loghum,et al.  Online … , 2002, LOG IN.

[7]  Konstantinos G. Margaritis,et al.  ACOUSTIC-TO-ARTICULATORY INVERSION OF SPEECH: A REVIEW , 2003 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Shinji Maeda,et al.  Compensatory Articulation During Speech: Evidence from the Analysis and Synthesis of Vocal-Tract Shapes Using an Articulatory Model , 1990 .

[10]  Jacob Benesty,et al.  On the Importance of the Pearson Correlation Coefficient in Noise Reduction , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Ritu Sharma Speech Synthesis , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[12]  Miguel Á. Carreira-Perpiñán,et al.  A comparison of acoustic features for articulatory inversion , 2007, INTERSPEECH.

[13]  Keiichi Tokuda,et al.  Acoustic-to-articulatory inversion mapping with Gaussian mixture model , 2004, INTERSPEECH.

[14]  John Nicholas Holmes,et al.  Speech synthesis , 1972 .

[15]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[16]  Louis Goldstein,et al.  Towards an articulatory phonology , 1986, Phonology.