论文信息 - Dimensionality reduction of modulation frequency features for speech discrimination

Dimensionality reduction of modulation frequency features for speech discrimination

We describe a dimensionality reduction method for modulation spectral features, which keeps the time-varying information of interest to the classification task. Due to the varying degrees of redundancy and discriminative power of the acoustic and modulation frequency subspaces, we first employ a generalization of SVD to tensors (Higher Order SVD) to reduce dimensions. Projection of modulation spectral features on the principal axes with the higher energy in each subspace results in a compact feature set. We further estimate the relevance of these projections to speech discrimination based on mutual information to the target class. Reconstruction of modulation spectrograms from the “best” 22 features back to the initial dimensions, shows that modulation spectral features close to syllable and phoneme rates as well as pitch values of speakers are preserved.

Yannis Stylianou | Maria E. Markaki | Y. Stylianou | M. Markaki

[1] Fuhui Long,et al. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[3] Nima Mesgarani,et al. Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4] L. Lathauwer,et al. Dimensionality reduction in higher-order signal processing and rank-(R1,R2,…,RN) reduction in multilinear algebra , 2004 .

[5] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[6] William Bialek,et al. Estimating mutual information and multi-information in large networks , 2005, ArXiv.

[7] Lie Lu,et al. Digital Object Identifier (DOI) 10.1007/s00530-002-0065-0 Multimedia Systems , 2003 .

[8] Les E. Atlas,et al. Modulation-scale analysis for content identification , 2004, IEEE Transactions on Signal Processing.

[9] Les E. Atlas,et al. EURASIP Journal on Applied Signal Processing 2003:7, 668–675 c ○ 2003 Hindawi Publishing Corporation Joint Acoustic and Modulation Frequency , 2003 .