A method of embedding data in an audio signal using cepstral domain modification is described. Based on successful embedding in the spectral points of perceptually masked regions in each frame of speech, first the technique was extended to embedding in the log spectral domain. This extension resulted at approximately 62 bits /s of embedding with less than 2 percent of bit error rate (BER) for a clean cover speech (from the TIMIT database), and about 2.5 percent for a noisy speech (from an air traffic controller database), when all frames - including silence and transition between voiced and unvoiced segments - were used. Bit error rate increased significantly when the log spectrum in the vicinity of a formant was modified. In the next procedure, embedding by altering the mean cepstral values of two ranges of indices was studied. Tests on both a noisy utterance and a clean utterance indicated barely noticeable perceptual change in speech quality when lower range of cepstral indices - corresponding to vocal tract region - was modified in accordance with data. With an embedding capacity of approximately 62 bits/s - using one bit per each frame regardless of frame energy or type of speech - initial results showed a BER of less than 1.5 percent for a payload capacity of 208 embedded bits using the clean cover speech. BER of less than 1.3 percent resulted for the noisy host with a capacity was 316 bits. When the cepstrum was modified in the region of excitation, BER increased to over 10 percent. With quantization causing no significant problem, the technique warrants further studies with different cepstral ranges and sizes. Pitch-synchronous cepstrum modification, for example, may be more robust to attacks. In addition, cepstrum modification in regions of speech that are perceptually masked - analogous to embedding in frequency masked regions - may yield imperceptible stego audio with low BER.
[1]
Ross J. Anderson,et al.
On the limits of steganography
,
1998,
IEEE J. Sel. Areas Commun..
[2]
A. Spanias,et al.
Perceptual coding of digital audio
,
2000,
Proceedings of the IEEE.
[3]
Xin Li,et al.
Transparent and robust audio data hiding in cepstrum domain
,
2000,
2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).
[4]
John H. L. Hansen,et al.
Discrete-Time Processing of Speech Signals
,
1993
.
[5]
Darko Kirovski,et al.
Spread-spectrum watermarking of audio signals
,
2003,
IEEE Trans. Signal Process..
[6]
Ching-Tang Hsieh,et al.
Blind cepstrum domain audio watermarking based on time energy features
,
2002,
2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).
[7]
Ahmed H. Tewfik,et al.
Current state of the art, challenges and future directions for audio watermarking
,
1999,
Proceedings IEEE International Conference on Multimedia Computing and Systems.
[8]
Walter Bender,et al.
Techniques for Data Hiding
,
1996,
IBM Syst. J..
[9]
Yo-Sung Ho,et al.
Digital audio watermarking in the cepstrum domain
,
2000,
IEEE Trans. Consumer Electron..
[10]
Ahmed H. Tewfik,et al.
Multimedia data-embedding and watermarking technologies
,
1998,
Proc. IEEE.