Dimensionality reduction of modulation frequency features for speech discrimination

We describe a dimensionality reduction method for modulation spectral features, which keeps the time-varying information of interest to the classification task. Due to the varying degrees of redundancy and discriminative power of the acoustic and modulation frequency subspaces, we first employ a generalization of SVD to tensors (Higher Order SVD) to reduce dimensions. Projection of modulation spectral features on the principal axes with the higher energy in each subspace results in a compact feature set. We further estimate the relevance of these projections to speech discrimination based on mutual information to the target class. Reconstruction of modulation spectrograms from the “best” 22 features back to the initial dimensions, shows that modulation spectral features close to syllable and phoneme rates as well as pitch values of speakers are preserved.