Transform and Bitstream Domain Image Classification

Classification of images within the compressed domain offers significant benefits. These benefits include reduced memory and computational requirements of a classification system. This paper proposes two such methods as a proof of concept: The first classifies within the JPEG image transform domain (i.e. DCT transform data); the second classifies the JPEG compressed binary bitstream directly. These two methods are implemented using Residual Network CNNs and an adapted Vision Transformer. Top-1 accuracy of approximately 70% and 60% were achieved using these methods respectively when classifying the Caltech C101 database. Although these results are significantly behind the state of the art for classification for this database (9̃5%), it illustrates the first time direct bitstream image classification has been achieved. This work confirms that direct bitstream image classification is possible and could be utilised in a first pass database screening of a raw bitstream (within a wired or wireless network) or where computational, memory and bandwidth requirements are severely restricted.

[1]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jun Niu,et al.  End-to-End JPEG Decoding and Artifacts Suppression Using Heterogeneous Residual Convolutional Neural Network , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[4]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  Jian Sun,et al.  Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Zhongfeng Wang,et al.  An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8]  Ivan V. Bajic,et al.  Can you Find a Face in a HEVC Bitstream? , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12]  Yiannis Andreopoulos,et al.  Compressed-domain video classification with deep neural networks: “There's way too much information to decode the matrix” , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Richard Frayne,et al.  A Hybrid, Dual Domain, Cascade of Convolutional Neural Networks for Magnetic Resonance Image Reconstruction , 2018, MIDL.

[15]  Gregory K. Wallace,et al.  The JPEG still picture compression standard , 1991, CACM.

[16]  Bruce R. Rosen,et al.  Image reconstruction by domain-transform manifold learning , 2017, Nature.

[17]  Salah Bourennane,et al.  Video Steganalysis in the Transform Domain Based on Morphological Structure of the Motion Vector Maps , 2021, 2021 9th European Workshop on Visual Information Processing (EUVIP).

[18]  Michael W. Marcellin,et al.  JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[19]  Lei Zhang,et al.  Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[20]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[21]  Georg Heigold,et al.  An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[22]  Eirikur Agustsson,et al.  High-Fidelity Generative Image Compression , 2020, NeurIPS.

[23]  Alexander Kolesnikov,et al.  Scaling Vision Transformers , 2021, ArXiv.