论文信息 - Reversible Speech De-identification Using Parametric Transformations and Watermarking

Reversible Speech De-identification Using Parametric Transformations and Watermarking

This paper presents a system capable of de-identifying speech signals in order to hide and protect the identity of the speaker. It applies a relatively simple yet effective transformation of the pitch and the frequency axis of the spectral envelope thanks to a flexible wideband harmonic model. Moreover, it inserts the parameters of the transformation in the signal by means of watermarking techniques, thus enabling re-identification. Our experiments show that for adequate modification factors its performance is satisfactory in terms of quality, de-identification degree and naturalness. The limitations due to the signal processing framework are discussed as well.

Inma Hernáez | Daniel Erro | Aitor Valdivielso

[1] Yannis Stylianou,et al. Harmonic plus noise models for speech, combined with statistical methods, for speech and speaker modification , 1996 .

[2] Florin Curelaru,et al. Front-End Factor Analysis For Speaker Verification , 2018, 2018 International Conference on Communications (COMM).

[3] Andries P. Hekstra,et al. Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[4] T. Moon. Error Correction Coding: Mathematical Methods and Algorithms , 2005 .

[5] P. Boersma. ACCURATE SHORT-TERM ANALYSIS OF THE FUNDAMENTAL FREQUENCY AND THE HARMONICS-TO-NOISE RATIO OF A SAMPLED SOUND , 1993 .

[6] Daniel Erro,et al. Piecewise linear definition of transformation functions for speaker de-identification , 2016, 2016 First International Workshop on Sensing, Processing and Learning for Intelligent Machines (SPLINE).

[7] Miran Pobar,et al. Online speaker de-identification using voice transformation , 2014, 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO).

[8] Daniel Erro,et al. Reversible speaker de-identification using pre-trained transformation functions , 2017, Comput. Speech Lang..

[9] Jon Sánchez,et al. Speech Watermarking Based on Coding of the Harmonic Phase , 2014, IberSPEECH.

[10] Yannis Stylianou,et al. Analysis and Synthesis of Speech Using an Adaptive Full-Band Harmonic Model , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[11] Nikola Pavesic,et al. De-identification for privacy protection in multimedia content: A survey , 2016, Signal Process. Image Commun..

[12] Akram M. Zeki,et al. Watermarking technique based on ISB (Intermediate Significant Bit) , 2010 .

[13] Guillermo Morales-Luna,et al. Audio Watermarking Based on Echo Hiding with Zero Error Probability , 2013, Int. J. Comput. Sci. Appl..

[14] Daniel Erro,et al. Flexible harmonic/stochastic speech synthesis , 2007, SSW.

[15] Tanja Schultz,et al. Speaker de-identification via voice transformation , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[16] Simon Dobrisek,et al. Speaker de-identification using diphone recognition and speech synthesis , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[17] Syed Abdul Rahman Al-Haddad,et al. An overview of digital speech watermarking , 2013, Int. J. Speech Technol..

[18] Darko Kirovski,et al. Spread-spectrum watermarking of audio signals , 2003, IEEE Trans. Signal Process..

[19] Keiichi Tokuda,et al. Mel-generalized cepstral analysis - a unified approach to speech spectral estimation , 1994, ICSLP.