Relevance of time-frequency features for phonetic and speaker-channel classification

Abstract The mutual information concept is used to study the distribution of speech information in frequency and in time. The main focus is on the information that is relevant for phonetic classification. A large database of hand-labeled fluent speech is used to (a) compute the mutual information (MI) between a phonetic classification variable and one spectral feature variable in the time–frequency plane, and (b) compute the joint mutual information (JMI) between the phonetic classification variable and two feature variables in the time–frequency plane. The MI and the JMI of the feature variables are used as relevance measures to select inputs for phonetic classifiers. Multi-layer perceptron (MLP) classifiers with one or two inputs are trained to recognize phonemes to examine the effectiveness of the input selection method based on the MI and the JMI. To analyze the non-linguistic sources of variability, we use speaker-channel labels to represent different speakers and different telephone channels and estimate the MI between the speaker-channel variable and one or two feature variables.

[1]  J. Moody,et al.  Feature Selection Based on Joint Mutual Information , 1999 .

[2]  Sarel van Vuuren,et al.  Relevancy of time-frequency features for phonetic classification measured by mutual information , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Jeff A. Bilmes,et al.  Maximum mutual information based reduction strategies for cross-correlation based joint distributional modeling , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Roberto Battiti,et al.  Using mutual information for selecting features in supervised neural net learning , 1994, IEEE Trans. Neural Networks.

[5]  G. Barrows,et al.  A mutual information measure for feature selection with application to pulse classification , 1996, Proceedings of Third International Symposium on Time-Frequency and Time-Scale Analysis (TFTS-96).

[6]  Jean-Luc Schwartz,et al.  An information theoretical investigation into the distribution of phonetic information across the auditory spectrogram , 1993, Comput. Speech Lang..

[7]  L. Goddard Information Theory , 1962, Nature.

[8]  J. R. Koehler,et al.  Modern Applied Statistics with S-Plus. , 1996 .

[9]  John E. Moody,et al.  Data Visualization and Feature Selection: New Algorithms for Nongaussian Data , 1999, NIPS.

[10]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[11]  Hynek Hermansky,et al.  Should recognizers have ears? , 1998, Speech Commun..

[12]  Robert B. Ash,et al.  Information Theory , 2020, The SAGE International Encyclopedia of Mass Media and Society.

[13]  Andreas S. Weigend,et al.  Nonparametric selection of input variables for connectionist learning , 1996 .

[14]  M. Kendall,et al.  Kendall's advanced theory of statistics , 1995 .

[15]  R. Cole,et al.  TELEPHONE SPEECH CORPUS DEVELOPMENT AT CSLU , 1998 .

[16]  A. W. Kemp,et al.  Kendall's Advanced Theory of Statistics. , 1994 .