A study of automatic phonetic segmentation for forensic voice comparison

Forensic voice comparison (FVC) systems have often involved manual annotation of usable phonetic units, requiring substantial human labor. Recent research has shown the efficacy of automatic methods in FVC, and this paper investigates automatic phonetic segmentation in FVC systems. Nasals and vowels were found to contribute the most in terms of improvements in both the validity and reliability of the system. Results show that as a function of the duration of the recognized tokens there is a trade-off in which an improvement in validity corresponds to a degradation in reliability and vice versa. An implication is that minimizing the error of automatically estimated monophone boundaries may not necessarily result in the best system validity or reliability. A substantial improvement in log-likelihood-ratio cost (validity) of 17.02% and in 95% credible interval (reliability) of 5.97% over the baseline system was possible by fusing baseline scores with those from nasal and vowel segments.

[1]  David A. van Leeuwen,et al.  An Introduction to Application-Independent Evaluation of Speaker Recognition Systems , 2007, Speaker Classification.

[2]  M. Savic,et al.  Use of semi-Markov models for speaker-independent phoneme recognition , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  J.P. Eatock,et al.  A quantitative assessment of the relative speaker discriminating properties of phonemes , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Claude Montacié,et al.  Investigations on speaker characterization from Orphee system techniques , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Michael Jessen,et al.  Forensic speaker verification using formant features and Gaussian mixture models , 2008, INTERSPEECH.

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Elizabeth Shriberg,et al.  Speaker recognition using syllable-based constraints for cepstral frame selection , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Geoffrey Stewart Morrison,et al.  Measuring the validity and reliability of forensic likelihood-ratio systems. , 2011, Science & justice : journal of the Forensic Science Society.

[9]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[10]  Geoffrey Stewart Morrison,et al.  A comparison of procedures for the calculation of forensic likelihood ratios from acoustic-phonetic data: Multivariate kernel density (MVKD) versus Gaussian mixture model-universal background model (GMM-UBM) , 2011, Speech Commun..

[11]  Hsiao-Wuen Hon,et al.  Speaker-independent phone recognition using hidden Markov models , 1989, IEEE Trans. Acoust. Speech Signal Process..

[12]  Frédéric Bimbot,et al.  Effect of utterance duration and phonetic content on speaker identification using second-order statistical methods , 1995, EUROSPEECH.

[13]  Tharmarajah Thiruvaran,et al.  Forensic Voice Comparison Using Chinese /iau/ , 2011, ICPhS.

[14]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[15]  M. Savic,et al.  Phoneme based speaker verification , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  M. Savic,et al.  Variable parameter speaker verification system based on hidden Markov modeling , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[17]  Julien Epps,et al.  An Issue in the Calculation of Logistic-Regression Calibration and Fusion Weights for Forensic Voice Comparison , 2010 .