Mel, linear, and antimel frequency cepstral coefficients in broad phonetic regions for telephone speaker recognition

We’ve examined the speaker discriminative power of mel-, antimeland linear-frequency cepstral coefficients (MFCCs, aMFCCs and LFCCs) in the nasal, vowel, and non-nasal consonant speech regions. Our inspiration came from the work of Lu and Dang in 2007, who showed that filterbank energies at some frequencies mainly outside the telephone bandwidth possess more speaker discriminative power due to physiological characteristics of speakers, and derived a set of cepstral coefficients that outperformed MFCCs in non-telephone speech. Using telephone speech, we’ve discovered that LFCCs gave 21.5% and 15.0% relative EER improvements over MFCCs in nasal and non-nasal consonant regions, agreeing with our filterbank energy f-ratio analysis. We’ve also found that using only the vowel region with MFCCs gives a 9.1% relative improvement over using all speech. Last, we’ve shown that a-MFCCs are valuable in combination, contributing to a system with 17.3% relative improvement over our baseline.