Speaker verification in score-ageing-quality classification space

A challenge in automatic speaker verification is to create a system that is robust to the effects of vocal ageing. To observe the ageing effect, a speaker's voice must be analysed over a period of time, over which, variation in the quality of the voice samples is likely to be encountered. Thus, in dealing with the ageing problem, the related issue of quality must also be addressed. We present a solution to speaker verification across ageing by using a stacked classifier framework to combine ageing and quality information with the scores of a baseline classifier. In tandem, the Trinity College Dublin Speaker Ageing database of 18 speakers, each covering a 30-60 year time range, is presented. An evaluation of a baseline Gaussian Mixture Model-Universal Background Model (GMM-UBM) system using this database demonstrates a progressive degradation in genuine speaker verification scores as ageing progresses. Consequently, applying a conventional threshold, determined using scores at the time of enrolment, results in poor long-term performance. The influence of quality on verification scores is investigated via a number of quality measures. Alongside established signal-based measures, a new model-based measure, Wnorm, is proposed, and its utility is demonstrated on the CSLU database. Combining ageing information with quality measures and the scores from the GMM-UBM system, a verification decision boundary is created in score-ageing-quality space. The best performance is achieved by using scores and ageing in conjunction with the new Wnorm quality measure, reducing verification error by 45% relative to the baseline. This work represents the first comprehensive analysis of speaker verification on a longitudinal speaker database and successfully addresses the associated variability from ageing and quality arte-facts.

[1]  Krzysztof Kryszczuk,et al.  Quality measures in unimodal and multimodal biometric verification , 2007, 2007 15th European Signal Processing Conference.

[2]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[3]  Brett Y. Smolenski,et al.  Long term examination of intra-session and inter-session speaker variability , 2009, INTERSPEECH.

[4]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[5]  Haizhou Li,et al.  GMM-SVM Kernel With a Bhattacharyya-Based Distance for Speaker Recognition , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Frédéric Bimbot,et al.  A Monte-Carlo method for score normalization in Automatic Speaker Verification using Kullback-Leibler distances , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Christopher Cieri,et al.  Greybeard Longitudinal Speech Study , 2010, LREC.

[8]  Andrzej Drygajlo,et al.  Q-stack aging model for face verification , 2009, 2009 17th European Signal Processing Conference.

[9]  John H. L. Hansen,et al.  A Study on Universal Background Model Training in Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Peter French,et al.  R v John Samuel Humble: The Yorkshire Ripper Hoaxer Trial , 2007 .

[11]  Joan E Sussman,et al.  Changes in acoustic characteristics of the voice across the life span: measures from individuals 4-93 years of age. , 2011, Journal of speech, language, and hearing research : JSLHR.

[12]  Andrzej Drygajlo,et al.  Adult Face Recognition in Score-Age-Quality Classification Space , 2011, BIOID.

[13]  J. Berger,et al.  P.563—The ITU-T Standard for Single-Ended Speech Quality Assessment , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  David C. Hoaglin,et al.  Some Implementations of the Boxplot , 1989 .

[15]  Ching Y. Suen,et al.  Investigating age invariant face recognition based on periocular biometrics , 2011, 2011 International Joint Conference on Biometrics (IJCB).

[16]  Stanley J. Wenndt,et al.  The multi-session audio research project (MARP) corpus: goals, design and initial findings , 2009, INTERSPEECH.

[17]  Susanne Schötz,et al.  Analysis and Synthesis of Speaker Age , 2007 .

[18]  Liang Lu,et al.  Studies on Model Distance Normalization Approach in Text-independent Speaker Verification , 2009 .

[19]  Susanne Schötz,et al.  Perception, Analysis and Synthesis of Speaker Age , 2006 .

[20]  Aaron E. Rosenberg,et al.  Speaker background models for connected digit password speaker verification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[21]  Rahul Shrivastav,et al.  Modeling perceived vocal age in american English , 2010, INTERSPEECH.

[22]  W. S. Brown,et al.  Speaking rate and fundamental frequency as speech cues to perceived age. , 2008, Journal of voice : official journal of the Voice Foundation.

[23]  Jana Dittmann,et al.  Biometrics and ID Management , 2011, Lecture Notes in Computer Science.

[24]  Ronald A. Cole,et al.  The CSLU speaker recognition corpus , 1998, ICSLP.

[25]  K. R. Farrell Adaptation of data fusion-based speaker verification models , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[26]  A. Rollett,et al.  The Monte Carlo Method , 2004 .

[27]  T. D. Hanley,et al.  Vocal aging. , 1959, Geriatrics.

[28]  Krzysztof Kryszczuk,et al.  Q-stack: Uni- and Multimodal Classifier Stacking with Quality Measures , 2007, MCS.

[29]  Jonas Richiardi,et al.  Speaker Verification with Confidence and Reliability Measures , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  Yiying Tong,et al.  Age-Invariant Face Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Elham Tabassi,et al.  Performance of Biometric Quality Measures , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Steve Renals,et al.  Ageing Voices: The Effect of Changes in Voice Parameters on ASR Performance , 2010, EURASIP J. Audio Speech Music. Process..

[33]  David A. van Leeuwen,et al.  Source normalization for language-independent speaker recognition using i-vectors , 2012, Odyssey.

[34]  Yun Lei,et al.  The role of age in factor analysis for speaker identification , 2009, INTERSPEECH.

[35]  Julian Fiérrez,et al.  Analysis of the Utility of Classical and Novel Speech Quality Measures for Speaker Verification , 2009, ICB.

[36]  Julian Fiérrez,et al.  Using quality measures for multilevel speaker recognition , 2006, Comput. Speech Lang..

[37]  Andrzej Drygajlo,et al.  Speaker verification with long-term ageing data , 2012, 2012 5th IAPR International Conference on Biometrics (ICB).

[38]  Ian H. Witten,et al.  Issues in Stacked Generalization , 2011, J. Artif. Intell. Res..

[39]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  Andrzej Drygajlo,et al.  Aging face verification in score-age space using single reference image template , 2010, 2010 Fourth IEEE International Conference on Biometrics: Theory, Applications and Systems (BTAS).

[41]  Krzysztof Kryszczuk,et al.  Reliability-Based Decision Fusion in Multimodal Biometric Verification Systems , 2007, EURASIP J. Adv. Signal Process..

[42]  Andreas Lanitis,et al.  A survey of the effects of aging on biometric identity verification , 2010, Int. J. Biom..

[43]  B J Benjamin,et al.  Speech Production of Normally Aging Adults , 1997, Seminars in speech and language.

[44]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[45]  George R. Doddington,et al.  The effect of target/non-target age difference on speaker recognition performance , 2012, Odyssey.

[46]  Mireille Avigal,et al.  Supervector Dimension Reduction for Efficient Speaker Age Estimation Based on the Acoustic Speech Signal , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[47]  Jonas Richiardi,et al.  Evaluation of speech quality measures for the purpose of speaker verification , 2008, Odyssey.

[48]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[49]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[50]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[51]  Xinggang Lin,et al.  Age simulation for face recognition , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[52]  Jonathan Harrington,et al.  Vocal aging effects on F0 and the first formant: A longitudinal analysis in adult speakers , 2010, Speech Commun..

[53]  Karl Ricanek,et al.  MORPH: a longitudinal image database of normal adult age-progression , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[54]  Samy Bengio,et al.  A statistical significance test for person authentication , 2004, Odyssey.

[55]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[56]  Jan Skoglund,et al.  Voice over IP: Speech Transmission over Packet Networks , 2008 .