Presentation Matters: Evaluating Speaker Identification Tasks

This paper details our evaluations and comparisons of speaker identification (SID) performance by listeners across different tasks. Experiment 1 participants completed traditional targetlineup (1-out-of-N speakers or out-of-set speaker) and binary (speaker verification) tasks. Experiment 2 participants completed trials online by using a clustering method by grouping speech recordings into speaker-specific clusters. Both studies employed similar speech recordings from the PTSVOX corpus. Our results showed participants who completed the binary and clustering tasks had higher accuracy than those who completed the target-lineup task. We also observed that independent of the tasks participants found some speakers significantly more difficult to identify relative to their foils. Pearson correlation procedures showed significant negative correlations between accuracy and task-dependent temporal-based metrics across tasks, where an increase in time required to make determinations yielded a decrease in perceptual SID performance. These findings underscored the important role of SID task design and the process of selecting speech recordings. Future work aims to examine the relationship between different perceptual SID task performances and scores generated by automatic speaker verification systems.

[1]  Alvin F. Martin,et al.  Human Assisted Speaker Recognition In NIST SRE10 , 2010, Odyssey.

[2]  Harry Hollien,et al.  Perceptual identification of voices under normal, stress, and disguised speaking conditions , 1974 .

[3]  Paula C. Stacey,et al.  Voice parade procedures: optimising witness performance , 2020, Memory.

[4]  Anil Alexander,et al.  Identifying Perceptually Similar Voices with a Speaker Recognition System Using Auto-Phonetic Features , 2016, INTERSPEECH.

[5]  Stefan Th. Gries,et al.  Prosody and its application to forensic linguistics , 2014 .

[6]  Oliver Durr,et al.  Speaker identification and clustering using convolutional neural networks , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[7]  R. Schwartz,et al.  The development of language-specific and language-independent talker processing. , 2013, Journal of speech, language, and hearing research : JSLHR.

[8]  Paul Boersma,et al.  Praat, a system for doing phonetics by computer , 2002 .

[9]  Laurianne Georgeton,et al.  PTSVOX : une base de données pour la comparaison de voix dans le cadre judiciaire (PTSVOX : a Speech Database for Forensic Voice Comparison ) , 2020, JEPTALNRECITAL.

[10]  Bruno L. Giordano,et al.  A language-familiarity effect for speaker discrimination without comprehension , 2014, Proceedings of the National Academy of Sciences.

[11]  Alain Ghio,et al.  PERCEVAL: a Computer-Driven System for Experimentation on Auditory and Visual Perception , 2007, ArXiv.

[12]  P. Bestelmeyer,et al.  The Bangor Voice Matching Test: A standardized test for the assessment of voice perception ability , 2017, Behavior Research Methods.

[13]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[14]  M. Sloos,et al.  Accent-induced bias in linguistic transcriptions , 2019, Language Sciences.

[15]  J. Mullennix,et al.  Typicality effects on memory for voice: Implications for earwitness testimony , 2011 .

[16]  Nala Rogers,et al.  Whose voice is that? , 2016, Science.

[17]  Tomi Kinnunen COMPARISON OF CLUSTERING ALGORITHMS IN SPEAKER IDENTIFICATION , 2000 .

[18]  Matthew H. Davis,et al.  Speech recognition in adverse conditions: A review , 2012 .

[19]  M. Rossum,et al.  Whose voice is that? Challenges in forensic phonetics , 2014 .