论文信息 - Transform and Bitstream Domain Image Classification

Transform and Bitstream Domain Image Classification

Classification of images within the compressed domain offers significant benefits. These benefits include reduced memory and computational requirements of a classification system. This paper proposes two such methods as a proof of concept: The first classifies within the JPEG image transform domain (i.e. DCT transform data); the second classifies the JPEG compressed binary bitstream directly. These two methods are implemented using Residual Network CNNs and an adapted Vision Transformer. Top-1 accuracy of approximately 70% and 60% were achieved using these methods respectively when classifying the Caltech C101 database. Although these results are significantly behind the state of the art for classification for this database (9̃5%), it illustrates the first time direct bitstream image classification has been achieved. This work confirms that direct bitstream image classification is possible and could be utilised in a first pass database screening of a raw bitstream (within a wired or wireless network) or where computational, memory and bandwidth requirements are severely restricted.

D. R. Bull | P. R. Hill

[1] Yoshua Bengio,et al. Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[2] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jun Niu,et al. End-to-End JPEG Decoding and Artifacts Suppression Using Heterogeneous Residual Convolutional Neural Network , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[4] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[5] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7] Zhongfeng Wang,et al. An Energy-Efficient Architecture for Binary Weight Convolutional Neural Networks , 2018, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[8] Ivan V. Bajic,et al. Can you Find a Face in a HEVC Bitstream? , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[10] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[12] Yiannis Andreopoulos,et al. Compressed-domain video classification with deep neural networks: “There's way too much information to decode the matrix” , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[13] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.

[14] Richard Frayne,et al. A Hybrid, Dual Domain, Cascade of Convolutional Neural Networks for Magnetic Resonance Image Reconstruction , 2018, MIDL.

[15] Gregory K. Wallace,et al. The JPEG still picture compression standard , 1991, CACM.

[16] Bruce R. Rosen,et al. Image reconstruction by domain-transform manifold learning , 2017, Nature.

[17] Salah Bourennane,et al. Video Steganalysis in the Transform Domain Based on Morphological Structure of the Motion Vector Maps , 2021, 2021 9th European Workshop on Visual Information Processing (EUVIP).

[18] Michael W. Marcellin,et al. JPEG2000 - image compression fundamentals, standards and practice , 2013, The Kluwer international series in engineering and computer science.

[19] Lei Zhang,et al. Beyond a Gaussian Denoiser: Residual Learning of Deep CNN for Image Denoising , 2016, IEEE Transactions on Image Processing.

[20] Pietro Perona,et al. Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[21] Georg Heigold,et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale , 2021, ICLR.

[22] Eirikur Agustsson,et al. High-Fidelity Generative Image Compression , 2020, NeurIPS.

[23] Alexander Kolesnikov,et al. Scaling Vision Transformers , 2021, ArXiv.