Evaluation of VOCALISE under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)

This paper presents an evaluation of the commercial forensic automatic speaker recognition system, VOCALISE (Voice Comparison and Analysis of the Likelihood of Speech Evidence) (Alexander et al. 2016), under conditions reflecting those of a real forensic case: forensic_eval_01. Full details of the evaluation rules, along with a description of the training data, testing data, and performance metrics, can be found in (Morrison and Enzinger 2016). VOCALISE is built with an ‘open-box’ architecture, offering the user choice between various feature extraction and speaker modelling approaches, and allowing the user to introduce their own development data at various points in the speaker modelling and comparison pipeline. This evaluation explores several ways in which the VOCALISE architecture can be applied to a realistic forensic comparison case.

[1]  James R. Glass,et al.  Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification , 2010, Odyssey.

[2]  Anil Alexander,et al.  Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features , 2016, INTERSPEECH.

[3]  David A. van Leeuwen,et al.  An Introduction to Application-Independent Evaluation of Speaker Recognition Systems , 2007, Speaker Classification.

[4]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[5]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[7]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[8]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[9]  Daniel Povey,et al.  MUSAN: A Music, Speech, and Noise Corpus , 2015, ArXiv.

[10]  Finnian Kelly,et al.  VOCALISE : A forensic automatic speaker recognition system supporting spectral , phonetic , and user-provided features , 2016 .

[11]  Geoffrey Stewart Morrison,et al.  Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) - Introduction , 2016, Speech Commun..

[12]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[13]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Mitchell McLaren,et al.  How to train your speaker embeddings extractor , 2018, Odyssey.

[15]  MorrisonGeoffrey Stewart,et al.  Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) Introduction , 2016 .

[16]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[17]  Pascal Druyts,et al.  Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions , 2000, Digit. Signal Process..

[18]  César A. Medina,et al.  Evaluation of MSR Identity Toolbox under conditions reflecting those of a real forensic case (forensic_eval_01) , 2017, Speech Commun..

[19]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[20]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .