Corpora for the Evaluation of Robust Speaker Recognition Systems

The goal of this paper is to describe significant corpora available to support speaker recognition research and evaluation, along with details about the corpora collection and design. We describe the attributes of high-quality speaker recognition corpora. Considerations of the application, domain, and performance metrics are also discussed. Additionally, a literature survey of corpora used in speaker recognition research over the last 10 years is presented. Finally we show the most common corpora used in the research community and review them on their success in enabling meaningful speaker recognition research.

[1]  Mark Liberman,et al.  The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research , 2006, LREC.

[2]  Aaron Lawson,et al.  The Speakers in the Wild (SITW) Speaker Recognition Database , 2016, INTERSPEECH.

[3]  M. F.,et al.  Bibliography , 1985, Experimental Gerontology.

[4]  George W. Quinn,et al.  Modest proposals for improving biometric recognition papers , 2015, 2015 IEEE 7th International Conference on Biometrics Theory, Applications and Systems (BTAS).

[5]  Bin Ma,et al.  The RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases , 2012, Interspeech 2012.

[6]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[7]  The NIST Year 2012 Speaker Recognition Evaluation Plan 1 I , 2022 .

[8]  David Miller,et al.  The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data , 2004, LREC.

[9]  Hynek Hermansky,et al.  Developing a speaker identification system for the DARPA RATS project , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[11]  Alvin F. Martin,et al.  The MMSR bilingual and crosschannel corpora for speaker recognition research and evaluation , 2004, Odyssey.

[12]  Shrikanth Narayanan,et al.  Real-time magnetic resonance imaging and electromagnetic articulography database for speech production research (TC). , 2014, The Journal of the Acoustical Society of America.

[13]  Andreas Stolcke,et al.  Feature-based and channel-based analyses of intrinsic variability in speaker verification , 2009, INTERSPEECH.

[14]  Douglas A. Reynolds,et al.  The NIST 2014 Speaker Recognition i-vector Machine Learning Challenge , 2014, Odyssey.

[15]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[16]  Joseph P. Campbell,et al.  Testing with the YOHO CD-ROM voice verification corpus , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  The NIST Year 2010 Speaker Recognition Evaluation Plan 1 I NTRODUCTION , 2022 .

[18]  Douglas A. Reynolds,et al.  Corpora for the evaluation of speaker recognition systems , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[19]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[20]  David A. van Leeuwen,et al.  An Introduction to Application-Independent Evaluation of Speaker Recognition Systems , 2007, Speaker Classification.

[21]  Alvin F. Martin,et al.  Conversational Telephone Speech Corpus Collection for the NIST Speaker Recognition Evaluation 2004 , 2004, LREC.

[22]  Thomas F. Quatieri,et al.  Relating Estimated Cyclic Spectral Peak Frequency to Measured Epilarynx Length Using Magnetic Resonance Imaging , 2016, INTERSPEECH.

[23]  Kevin Walker,et al.  The RATS radio traffic collection system , 2012, Odyssey.