Likelihood Ratio-based Forensic Voice Comparison on L2 speakers: A Case of Hong Kong native male production of English vowels

This study is a pilot research that explores the effectiveness of a likelihood ratio (LR)-based forensic voice comparison (FVC) system built on non-native speech production. More specifically, it looks at native Hong Kong Cantonese-speaking male productions of English vowels, and the extent to which FVC can work on these speakers. 15 speakers participated in the research, involving two noncontemporaneous recording sessions with six predetermined target words – “hello”, “bye”, “left”, “right”, “yes”, and “no”. Formant frequency values were measured from the trajectories of the vowels and surrounding segments. These trajectories were modelled using discrete cosine transforms for each formant (F1, F2 and F3), and the coefficient values were used as feature vectors in the LR calculations. LRs were calculated using the multivariate-kernel-density method. The results are reported along two metrics of performance, namely the log-likelihood-ratio cost and 95% credible intervals. The six bestperforming word-specific outputs are presented and compared. We find that FVC can be built using L2 speech production, and the results are comparable to similar systems built on native speech.

[1]  Andreas Stolcke,et al.  The SRI NIST 2010 speaker recognition evaluation system , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Andreas Stolcke,et al.  THE SRI NIST 2008 speaker recognition evaluation system , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Franco Taroni,et al.  Statistics and the Evaluation of Evidence for Forensic Scientists , 2004 .

[4]  Phil Rose Technical forensic speaker identification from a Bayesian linguist's perspective , 2004, Odyssey.

[5]  Shunichi Ishihara,et al.  How many do we need? exploration of the population size effect on the performance of forensic speaker classification , 2008, INTERSPEECH.

[6]  Philip Rose Forensic Speaker Identification , 2002 .

[7]  J. Harrington,et al.  An acoustic phonetic study of broad, general, and cultivated Australian English vowels* , 1997 .

[8]  Geoffrey Stewart Morrison,et al.  Forensic voice comparison and the paradigm shift. , 2009, Science & justice : journal of the Forensic Science Society.

[9]  G. Morrison Likelihood-ratio forensic voice comparison using parametric representations of the formant trajectories of diphthongs. , 2009, The Journal of the Acoustical Society of America.

[10]  Colin Aitken,et al.  Evaluation of trace evidence in the form of multivariate data , 2004 .

[11]  I. W. Evett,et al.  Towards a uniform framework for reporting opinions in forensic science casework , 1998 .

[12]  Felicity Cox Australian English Pronunciation and Transcription , 2012 .

[13]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[14]  Geoffrey Durou Multilingual Text-Independent Speaker Identification , 2000 .

[15]  Tharmarajah Thiruvaran,et al.  Database selection for forensic voice comparison , 2012, Odyssey.

[16]  Philip Rose More is better: Likelihood ratio-based forensic voice comparison with vocalic segmental cepstra frontends , 2013 .

[17]  Geoffrey Stewart Morrison,et al.  Automatic-type calibration of traditionally derived likelihood ratios: forensic analysis of australian English /o/ formant trajectories , 2008, INTERSPEECH.

[18]  Philip Rose,et al.  An empirical estimate of the precision of likelihood ratios from a forensic-voice-comparison system. , 2011, Forensic science international.

[19]  Geoffrey Stewart Morrison,et al.  Measuring the validity and reliability of forensic likelihood-ratio systems. , 2011, Science & justice : journal of the Forensic Science Society.

[20]  Geoffrey Stewart Morrison,et al.  Tutorial on logistic-regression calibration and fusion:converting a score to a likelihood ratio , 2013, 2104.08846.

[21]  Geoffrey Stewart Morrison Forensic voice Comparison using likelihood ratios based on Polynomial curves fitted to the formant trajectories of Australian English , 2009 .

[22]  Peter Ladefoged,et al.  Vowels and Consonants , 2000, Manchu Grammar.

[23]  Bernard Robertson,et al.  Interpreting Evidence: Evaluating Forensic Science in the Courtroom , 1995 .