Speaker anonymisation using the McAdams coefficient

Anonymisation has the goal of manipulating speech signals in order to degrade the reliability of automatic approaches to speaker recognition, while preserving other aspects of speech, such as those relating to intelligibility and naturalness. This paper reports an approach to anonymisation that, unlike other current approaches, requires no training data, is based upon well-known signal processing techniques and is both efficient and effective. The proposed solution uses the McAdams coefficient to transform the spectral envelope of speech signals. Results derived using common VoicePrivacy 2020 databases and protocols show that random, optimised transformations can outperform competing solutions in terms of anonymisation while causing only modest, additional degradations to intelligibility, even in the case of a semi-informed privacy adversary.

[1]  Driss Matrouf,et al.  Speech Pseudonymisation Assessment Using Voice Similarity Matrices , 2020, INTERSPEECH.

[2]  Nicholas W. D. Evans,et al.  Preserving privacy in speaker and speech characterisation , 2019, Comput. Speech Lang..

[3]  Marc Tommasi,et al.  Design Choices for X-vector Based Speaker Anonymization , 2020, INTERSPEECH.

[4]  Li Lee,et al.  A frequency warping approach to speaker normalization , 1998, IEEE Trans. Speech Audio Process..

[5]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Hitoshi Kiya,et al.  Lightweight Voice Anonymization Based on Data-Driven Optimization of Cascaded Voice Modification Modules , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).

[7]  Linlin Chen,et al.  Hidebehind: Enjoy Voice Input with Voiceprint Unclonability and Anonymity , 2018, SenSys.

[8]  Stephen McAdams,et al.  Spectral fusion, spectral parsing and the formation of auditory images , 1984 .

[9]  Marc Tommasi,et al.  Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion? , 2019, INTERSPEECH.

[10]  Isabel Trancoso,et al.  The GDPR & Speech Data: Reflections of Legal and Technology Communities, First Steps towards a Common Understanding , 2019, INTERSPEECH.

[11]  John H. L. Hansen,et al.  Convolutional Neural Network Based Speaker De-Identification , 2018, Odyssey.

[12]  Miran Pobar,et al.  Online speaker de-identification using voice transformation , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[13]  Qin Yan,et al.  Cross-entropic comparison of formants of British, Australian and American English accents , 2008, Speech Commun..

[14]  Junichi Yamagishi,et al.  CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2017 .

[15]  E. Vincent,et al.  Introducing the VoicePrivacy Initiative , 2020, INTERSPEECH.

[16]  Yu Wang,et al.  VoiceMask: Anonymize and Sanitize Voice Input on Mobile Devices , 2017, ArXiv.

[17]  Charles Dodge,et al.  Computer Music: Synthesis, Composition, and Performance , 1997 .

[18]  Junichi Yamagishi,et al.  Speaker Anonymization Using X-vector and Neural Waveform Models , 2019, 10th ISCA Workshop on Speech Synthesis (SSW 10).

[19]  Marc Tommasi,et al.  Evaluating Voice Conversion-Based Privacy Protection against Informed Attackers , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Junichi Yamagishi,et al.  SUPERSEDED - CSTR VCTK Corpus: English Multi-speaker Corpus for CSTR Voice Cloning Toolkit , 2016 .

[21]  Emmanuel Vincent,et al.  The VoicePrivacy 2020 Challenge Evaluation Plan , 2022, ArXiv.

[22]  Tanja Schultz,et al.  Speaker de-identification via voice transformation , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[23]  Daniel Erro,et al.  Reversible speaker de-identification using pre-trained transformation functions , 2017, Comput. Speech Lang..

[24]  Jordan Cohen,et al.  Vocal tract normalization in speech recognition: Compensating for systematic speaker variability , 1995 .

[25]  Madhu R. Kamble,et al.  Design of Voice Privacy System using Linear Prediction , 2020, 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[26]  A. Nautsch,et al.  The Privacy ZEBRA: Zero Evidence Biometric Recognition Assessment , 2020, INTERSPEECH.