Training CNNs in Presence of JPEG Compression: Multimedia Forensics vs Computer Vision

Convolutional Neural Networks (CNNs) have proved very accurate in multiple computer vision image classification tasks that required visual inspection in the past (e.g., object recognition, face detection, etc.). Motivated by these astonishing results, researchers have also started using CNNs to cope with image forensic problems (e.g., camera model identification, tampering detection, etc.). However, in computer vision, image classification methods typically rely on visual cues easily detectable by human eyes. Conversely, forensic solutions rely on almost invisible traces that are often very subtle and lie in the fine details of the image under analysis. For this reason, training a CNN to solve a forensic task requires some special care, as common processing operations (e.g., resampling, compression, etc.) can strongly hinder forensic traces. In this work, we focus on the effect that JPEG has on CNN training considering different computer vision and forensic image classification problems. Specifically, we consider the issues that rise from JPEG compression and misalignment of the JPEG grid. We show that it is necessary to consider these effects when generating a training dataset in order to properly train a forensic detector not losing generalization capability, whereas it is almost possible to ignore these effects for computer vision tasks.

[1]  Hany Farid,et al.  Photo forensics from JPEG dimples , 2017, 2017 IEEE Workshop on Information Forensics and Security (WIFS).

[2]  Belhassen Bayar,et al.  Towards Open Set Camera Model Identification Using a Deep Learning Framework , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  DeLiang Wang,et al.  Supervised Speech Separation Based on Deep Learning: An Overview , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[4]  Davide Cozzolino,et al.  Noiseprint: A CNN-Based Camera Model Fingerprint , 2018, IEEE Transactions on Information Forensics and Security.

[5]  Yinda Zhang,et al.  LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop , 2015, ArXiv.

[6]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[7]  Alexandr A. Kalinin,et al.  Albumentations: fast and flexible image augmentations , 2018, Inf..

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[10]  Paolo Bestagini,et al.  CNN-Based Fast Source Device Identification , 2020, IEEE Signal Processing Letters.

[11]  Luisa Verdoliva,et al.  Do GANs Leave Artificial Fingerprints? , 2018, 2019 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[12]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[13]  Sandy H. Huang,et al.  Adversarial Attacks on Neural Network Policies , 2017, ICLR.

[14]  Fernando Pérez-González,et al.  Multiple JPEG compression detection by means of Benford-Fourier coefficients , 2014, 2014 IEEE International Workshop on Information Forensics and Security (WIFS).

[15]  Mauro Barni,et al.  Cnn-Based Detection of Generic Contrast Adjustment with Jpeg Post-Processing , 2018, 2018 25th IEEE International Conference on Image Processing (ICIP).

[16]  Marc Chaumont,et al.  Camera model identification with the use of deep convolutional neural networks , 2016, 2016 IEEE International Workshop on Information Forensics and Security (WIFS).

[17]  Hany Farid,et al.  Exposing Digital Forgeries From JPEG Ghosts , 2009, IEEE Transactions on Information Forensics and Security.

[18]  Paolo Bestagini,et al.  First Steps Toward Camera Model Identification With Convolutional Neural Networks , 2016, IEEE Signal Processing Letters.

[19]  Stefano Tubaro,et al.  Discriminating multiple JPEG compression using first digit features , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Andrew Owens,et al.  CNN-Generated Images Are Surprisingly Easy to Spot… for Now , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Cameron Johnson,et al.  SPN-CNN: Boosting Sensor-Based Source Camera Attribution With Deep Learning , 2019, 2019 IEEE International Workshop on Information Forensics and Security (WIFS).

[22]  Sung Hee Choi,et al.  End-to-end double JPEG detection with a 3D convolutional network in the DCT domain , 2020 .

[23]  Vincenzo Lipari,et al.  Multiple Jpeg Compression Detection Through Task-Driven Non-Negative Matrix Factorization , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[25]  Paolo Bestagini,et al.  Video Face Manipulation Detection Through Ensemble of CNNs , 2020, 2020 25th International Conference on Pattern Recognition (ICPR).

[26]  Paolo Bestagini,et al.  JPEG Implementation Forensics Based on Eigen-Algorithms , 2018, 2018 IEEE International Workshop on Information Forensics and Security (WIFS).

[27]  François Chollet,et al.  Xception: Deep Learning with Depthwise Separable Convolutions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Alessandro Piva,et al.  Detection of Nonaligned Double JPEG Compression Based on Integer Periodicity Maps , 2012, IEEE Transactions on Information Forensics and Security.

[29]  Paolo Bestagini,et al.  Aligned and Non-Aligned Double JPEG Detection Using Convolutional Neural Networks , 2017, J. Vis. Commun. Image Represent..

[30]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[31]  Paolo Bestagini,et al.  Tampering Detection and Localization Through Clustering of Camera-Based CNN Features , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[32]  Luisa Verdoliva,et al.  Media Forensics and DeepFakes: An Overview , 2020, IEEE Journal of Selected Topics in Signal Processing.

[33]  Rémi Cogranne,et al.  Estimation of Primary Quantization Steps in Double-Compressed JPEG Images Using a Statistical Model of Discrete Cosine Transform , 2019, IEEE Access.

[34]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[35]  Cristiano Saltori,et al.  Incremental learning for the detection and classification of GAN-generated images , 2019, 2019 IEEE International Workshop on Information Forensics and Security (WIFS).

[36]  Ricardo L. de Queiroz,et al.  Identification of bitmap compression history: JPEG detection and quantizer estimation , 2003, IEEE Trans. Image Process..

[37]  Marco Fontani,et al.  VISION: a video and image dataset for source identification , 2017, EURASIP Journal on Information Security.

[38]  Mauro Barni,et al.  Identification of cut & paste tampering by means of double-JPEG detection and image segmentation , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[39]  Edward J. Delp,et al.  Deepfake Video Detection Using Recurrent Neural Networks , 2018, 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS).

[40]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).