Hiding Video in Audio via Reversible Generative Models

We present a method for hiding video content inside audio files while preserving the perceptual fidelity of the cover audio. This is a form of cross-modal steganography and is particularly challenging due to the high bitrate of video. Our scheme uses recent advances in flow-based generative models, which enable mapping audio to latent codes such that nearby codes correspond to perceptually similar signals. We show that compressed video data can be concealed in the latent codes of audio sequences while preserving the fidelity of both the hidden video and the cover audio. We can embed 128x128 video inside same-duration audio, or higher-resolution video inside longer audio sequences. Quantitative experiments show that our approach outperforms relevant baselines in steganographic capacity and fidelity.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Li Fei-Fei,et al.  HiDDeN: Hiding Data With Deep Networks , 2018, ECCV.

[3]  Ryan Prenger,et al.  Waveglow: A Flow-based Generative Network for Speech Synthesis , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  Walter Bender,et al.  Techniques for data hiding , 1995, Electronic Imaging.

[6]  Akram M. Zeki,et al.  A Genetic-Algorithm-Based Approach for Audio Steganography , 2009 .

[7]  Hamzeh Ghasemzadeh,et al.  A Comprehensive Review of Audio Steganalysis Methods , 2017, IET Signal Process..

[8]  Marc Chaumont,et al.  Deep learning is a good steganalysis tool when embedding key is reused for different images, even if there is a cover sourcemismatch , 2015, Media Watermarking, Security, and Forensics.

[9]  Moustapha Cissé,et al.  Fooling End-To-End Speaker Verification With Adversarial Examples , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Jing Dong,et al.  Deep learning for steganalysis via convolutional neural networks , 2015, Electronic Imaging.

[11]  Yoshua Bengio,et al.  NICE: Non-linear Independent Components Estimation , 2014, ICLR.

[12]  Adnan Khalid,et al.  An enhanced least significant bit modification technique for audio steganography , 2011, International Conference on Computer Networks and Information Technology.

[13]  David Minnen,et al.  Full Resolution Image Compression with Recurrent Neural Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Litao Gang,et al.  MP3 resistant oblivious steganography , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[15]  Jessica J. Fridrich,et al.  Detecting LSB Steganography in Color and Gray-Scale Images , 2001, IEEE Multim..

[16]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[17]  Walter Bender,et al.  Echo Hiding , 1996, Information Hiding.

[18]  Jan H. P. Eloff,et al.  An overview of image steganography , 2005, ISSA.

[19]  S. K. Jagtap,et al.  Intelligent processing: An approach of audio steganography , 2012, 2012 International Conference on Communication, Information & Computing Technology (ICCICT).

[20]  Karim Abed-Meraim,et al.  A view on latest audio steganography techniques , 2011, 2011 International Conference on Innovations in Information Technology.

[21]  Tapio Seppänen,et al.  Increasing the capacity of LSB-based audio steganography , 2002, 2002 IEEE Workshop on Multimedia Signal Processing..

[22]  Chris Donahue,et al.  Adversarial Audio Synthesis , 2018, ICLR.

[23]  Omaima N. A. AL-Allaf,et al.  Hiding an Image inside another Image using Variable-Rate Steganography , 2013 .

[24]  Moustapha Cissé,et al.  Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples , 2017, NIPS.

[25]  Kun Yang,et al.  CNN-based Steganalysis of MP3 Steganography in the Entropy Code Domain , 2018, IH&MMSec.

[26]  Wei Ping,et al.  ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech , 2018, ICLR.

[27]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[28]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Tomás Pevný,et al.  Using High-Dimensional Image Models to Perform Highly Undetectable Steganography , 2010, Information Hiding.

[30]  Shumeet Baluja,et al.  Hiding Images in Plain Sight: Deep Steganography , 2017, NIPS.

[31]  Heiga Zen,et al.  Parallel WaveNet: Fast High-Fidelity Speech Synthesis , 2017, ICML.

[32]  Markus G. Kuhn,et al.  Information hiding-a survey , 1999, Proc. IEEE.

[33]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[34]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[35]  George Danezis,et al.  Generating steganographic images via adversarial training , 2017, NIPS.

[36]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[37]  Zhou Wang,et al.  Multiscale structural similarity for image quality assessment , 2003, The Thrity-Seventh Asilomar Conference on Signals, Systems & Computers, 2003.

[38]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[39]  Jessica J. Fridrich,et al.  Designing steganographic distortion using directional filters , 2012, 2012 IEEE International Workshop on Information Forensics and Security (WIFS).