Reversible Speech De-identification Using Parametric Transformations and Watermarking

This paper presents a system capable of de-identifying speech signals in order to hide and protect the identity of the speaker. It applies a relatively simple yet effective transformation of the pitch and the frequency axis of the spectral envelope thanks to a flexible wideband harmonic model. Moreover, it inserts the parameters of the transformation in the signal by means of watermarking techniques, thus enabling re-identification. Our experiments show that for adequate modification factors its performance is satisfactory in terms of quality, de-identification degree and naturalness. The limitations due to the signal processing framework are discussed as well.

[1]  Yannis Stylianou,et al.  Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[2]  Florin Curelaru,et al.  Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[3]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4]  T. Moon Error Correction Coding: Mathematical Methods and Algorithms , 2005 .

[5]  P. Boersma ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[6]  Daniel Erro,et al.  Piecewise linear definition of transformation functions for speaker de-identification , 2016, 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE).

[7]  Miran Pobar,et al.  Online speaker de-identification using voice transformation , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[8]  Daniel Erro,et al.  Reversible speaker de-identification using pre-trained transformation functions , 2017, Comput. Speech Lang..

[9]  Jon Sánchez,et al.  Speech Watermarking Based on Coding of the Harmonic Phase , 2014, IberSPEECH.

[10]  Yannis Stylianou,et al.  Analysis and Synthesis of Speech Using an Adaptive Full-Band Harmonic Model , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Nikola Pavesic,et al.  De-identification for privacy protection in multimedia content: A survey , 2016, Signal Process. Image Commun..

[12]  Akram M. Zeki,et al.  Watermarking technique based on ISB (Intermediate Significant Bit) , 2010 .

[13]  Guillermo Morales-Luna,et al.  Audio Watermarking Based on Echo Hiding with Zero Error Probability , 2013, Int. J. Comput. Sci. Appl..

[14]  Daniel Erro,et al.  Flexible harmonic/stochastic speech synthesis , 2007, SSW.

[15]  Tanja Schultz,et al.  Speaker de-identification via voice transformation , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[16]  Simon Dobrisek,et al.  Speaker de-identification using diphone recognition and speech synthesis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[17]  Syed Abdul Rahman Al-Haddad,et al.  An overview of digital speech watermarking , 2013, Int. J. Speech Technol..

[18]  Darko Kirovski,et al.  Spread-spectrum watermarking of audio signals , 2003, IEEE Trans. Signal Process..

[19]  Keiichi Tokuda,et al.  Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.