Human and Machine Speaker Recognition Based on Short Trivial Events

Human speech often has events that we will call trivial events, e.g., cough, laugh and sniff. Compared to regular speech, these trivial events are usually short and variable, thus generally regarded as not speaker discriminative and so are largely ignored by present speaker recognition research. However, these trivial events are highly valuable in some particular circumstances such as forensic examination, as they are less subjected to intentional change, so can be used to discover the genuine speaker from disguised speech. In this paper, we collect a trivial event speech database that involves 75 speakers and 6 types of events, and report preliminary speaker recognition results on this database, by both human listeners and machines. Particularly, the deep feature learning technique recently proposed by our group is utilized to analyze and recognize the trivial events, leading to acceptable equal error rates (EERs) ranging from 5% to 15% despite the extremely short durations (0.2-0.5 seconds) of these events. Comparing different types of events, ‘hmm’ seems more speaker discriminative.

[1]  Dong Wang,et al.  Speaker recognition with cough, laugh and "Wei" , 2017, 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[2]  Mahesh Kumar Nandwana,et al.  Analysis of human scream and its impact on text-independent speaker verification. , 2017, The Journal of the Acoustical Society of America.

[3]  Dong Wang,et al.  Deep Speaker Feature Learning for Text-Independent Speaker Verification , 2017, INTERSPEECH.

[4]  Figen Ertaş,et al.  FUNDAMENTALS OF SPEAKER RECOGNITION , 2011 .

[5]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[6]  Georg Heigold,et al.  End-to-end text-dependent speaker verification , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  John H. L. Hansen,et al.  Analysis and identification of human scream: implications for speaker recognition , 2014, INTERSPEECH.

[8]  Debnath Bhattacharyya,et al.  Biometric Authentication: A Review , 2009 .

[9]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[10]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Paavo Alku,et al.  Speaker identification from shouted speech: Analysis and compensation , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[13]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[15]  Woo Chaw Seng,et al.  A review of biometric technology along with trends and prospects , 2014, Pattern Recognit..

[16]  Erik McDermott,et al.  Deep neural networks for small footprint text-dependent speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[17]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[18]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  John H. L. Hansen,et al.  Speaker Identification Within Whispered Speech Audio Streams , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .