A hybrid parameterization technique for Speaker Identification

Classical parameterization techniques for Speaker Identification use the codification of the power spectral density of raw speech, not discriminating between articulatory features produced by vocal tract dynamics (acoustic-phonetics) from glottal source biometry. Through the present paper a study is conducted to separate voicing fragments of speech into vocal and glottal components, dominated respectively by the vocal tract transfer function estimated adaptively to track the acoustic-phonetic sequence of the message, and by the glottal characteristics of the speaker and the phonation gesture. The separation methodology is based in Joint Process Estimation under the uncorrelation hypothesis between vocal and glottal spectral distributions. Its application on voiced speech is presented in the time and frequency domains. The parameterization methodology is also described. Speaker Identification experiments conducted on 245 speakers are shown comparing different parameterization strategies. The results confirm the better performance of decoupled parameterization compared against approaches based on plain speech parameterization.

[1]  Simon Haykin,et al.  Adaptive Filter Theory 4th Edition , 2002 .

[2]  J. Perkell,et al.  Glottal airflow and transglottal air pressure measurements for male and female speakers in soft, normal, and loud voice. , 1988, The Journal of the Acoustical Society of America.

[3]  Watanabe Yusuke,et al.  Automatic Speech Character Identification using Vocal Tract information , 2008 .

[4]  Peter J. Murphy,et al.  Estimation of the vocal tract transfer function with application to glottal wave analysis , 2005, Speech Commun..

[5]  J. Liljencrants,et al.  Dept. for Speech, Music and Hearing Quarterly Progress and Status Report a Four-parameter Model of Glottal Flow , 2022 .

[6]  José B. Mariño,et al.  Albayzin speech database: design of the phonetic corpus , 1993, EUROSPEECH.

[7]  Paavo Alku,et al.  Parameterisation Methods of the Glottal Flow Estimated by Inverse Filtering , 1985 .

[8]  P. J. Price,et al.  Male and female voice source characteristics: Inverse filtering results , 1989, Speech Commun..

[9]  R.M. Nickel,et al.  Feature - Automatic speech character identification , 2006, IEEE Circuits and Systems Magazine.

[10]  S. Haykin,et al.  Adaptive Filter Theory , 1986 .

[11]  Pedro Gómez Vilda,et al.  Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors , 2004, IEEE Transactions on Biomedical Engineering.

[12]  H. Wit,et al.  Glottal volume velocity waveform characteristics in subjects with and without vocal training, related to gender, sound intensity, fundamental frequency, and age. , 1996, The Journal of the Acoustical Society of America.

[13]  María Victoria Rodellar Biarge,et al.  Biometrical Speaker Description From Vocal Cord Parameterization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  María Victoria Rodellar Biarge,et al.  DOA Detection from HOS by FOD Beamforming and Joint-Process Estimation , 2004, ICA.

[15]  Douglas A. Reynolds,et al.  Modeling of the glottal flow derivative waveform with application to speaker identification , 1999, IEEE Trans. Speech Audio Process..

[16]  Paavo Alku An automatic method to estimate the time-based parameters of the glottal pulseform , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Jeff Kuo,et al.  Discriminating speakers with vocal nodules using aerodynamic and acoustic features , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).