CLASSIFICATION OF SEVERELY OCCLUDED IMAGE SEQUENCES VIA CONVOLUTIONAL RECURRENT NEURAL NETWORKS

Classifying severely occluded images is a challenging yet highly-needed task. In this paper, motivated by the fact that human being can exploit context information to assist learning, we apply convolutional recurrent neural network (CRNN) to attack this challenging problem. A CRNN architecture that integrates convolutional neural network (CNN) with long short-term memory (LSTM) is presented. Three new datasets with severely occluded images and context information are created. Extensive experiments are conducted to compare the performance of CRNN against conventional methods and human experimenters. The experiment results show that the CRNN outperforms both conventional methods and most of the human experimenters. This demonstrates that CRNN can effectively learn and exploit the unspecified context information among image sequences, and thus can be an effective approach to resolve the challenging problem of classifying severely occluded images.

[1]  Matthew J. Hausknecht,et al.  Beyond short snippets: Deep networks for video classification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  J. Bermúdez Cognitive Science: An Introduction to the Science of the Mind , 2020 .

[3]  Tinne Tuytelaars,et al.  Rank Pooling for Action Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Ronan Collobert,et al.  Learning to Segment Object Candidates , 2015, NIPS.

[5]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[6]  Mario Fritz,et al.  To Fall Or Not To Fall: A Visual Approach to Physical Stability Prediction , 2016, ArXiv.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  John R. Hershey,et al.  Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Gang Wang,et al.  Convolutional recurrent neural networks: Learning spatial dependencies for image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[10]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[11]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Yann LeCun,et al.  Regularization of Neural Networks using DropConnect , 2013, ICML.

[13]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[14]  Trevor Darrell,et al.  Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding , 2016, EMNLP.

[15]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[17]  Stefan C. Kremer,et al.  Recurrent Neural Networks , 2013, Handbook on Neural Information Processing.

[18]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[19]  Stephen Grossberg,et al.  Recurrent neural networks , 2013, Scholarpedia.

[20]  Silvio Savarese,et al.  Action Recognition by Hierarchical Mid-Level Action Elements , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Lina J. Karam,et al.  Understanding how image quality affects deep neural networks , 2016, 2016 Eighth International Conference on Quality of Multimedia Experience (QoMEX).

[22]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[23]  Hazim Kemal Ekenel,et al.  How Image Degradations Affect Deep CNN-Based Face Recognition? , 2016, 2016 International Conference of the Biometrics Special Interest Group (BIOSIG).

[24]  Lina J. Karam,et al.  Quality Resilient Deep Neural Networks , 2017, ArXiv.

[25]  Ngai-Man Cheung,et al.  On classification of distorted images with deep convolutional neural networks , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[26]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Gregory Cohen,et al.  EMNIST: an extension of MNIST to handwritten letters , 2017, CVPR 2017.

[28]  Trevor Darrell,et al.  Long-term recurrent convolutional networks for visual recognition and description , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[30]  Stellan Ohlsson,et al.  Deep Learning - How the Mind Overrides Experience , 2011 .

[31]  Jean-Michel Morel,et al.  A Review of Image Denoising Algorithms, with a New One , 2005, Multiscale Model. Simul..