Gradual recovery based occluded digit images recognition

Recent research shows that auto-encoder is suitable to model a variation which varies smoothly. In this paper, we attempt to utilize auto-encoder to recognize partially occluded digit images with gradual recovery. We propose a new variation of auto-encoder, namely the “generalized auto-encoder”, and construct stacked generalized auto-encoders (SGAE) for the problem of occluded digit images recovery and recognition. Rather than recovering the occlusion directly, the degree of occlusion is regarded as a continuous variable, and the recovery task is regarded as a gradual process. We divide the whole task into multiple intermediate recovery procedures, and assign each procedure to one generalized auto-encoder, thus handling the recovery problem gradually. Based on the encouraging recovery results, the occluded digit images can be recognized well. The results demonstrate that gradual recovery outperforms direct recovery of the occluded region. Moreover, the main application in this paper is occluded digit images recognition, though, the proposed framework can be generalized to other problems easily and nicely. Extensive experiments are designed to verify our settings and show the effectiveness, extendibility and generalizability of the method.

[1]  Rodrigo Benenson Occlusion Handling , 2014, Computer Vision, A Reference Guide.

[2]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.

[3]  Manik Varma,et al.  Character Recognition in Natural Images , 2009, VISAPP.

[4]  Alan L. Yuille,et al.  DOC: Deep OCclusion Recovering From A Single Image , 2015, ArXiv.

[5]  Yi Wu,et al.  Online Object Tracking: A Benchmark , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[7]  Hongbin Zha,et al.  Tracking Generic Human Motion via Fusion of Low- and High-Dimensional Approaches , 2013, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[8]  Ming Shao,et al.  Attractive or Not?: Beauty Prediction with Attractiveness-Aware Encoders and Robust Late Fusion , 2014, ACM Multimedia.

[9]  Minyi Guo,et al.  Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-Encoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  P. Ekman,et al.  Facial action coding system , 2019 .

[11]  Jürgen Schmidhuber,et al.  Multi-column deep neural networks for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Meng Wang,et al.  Deep Aging Face Verification With Large Gaps , 2016, IEEE Transactions on Multimedia.

[13]  Takeo Kanade,et al.  Comprehensive database for facial expression analysis , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[14]  Fernando De la Torre,et al.  Supervised Descent Method and Its Applications to Face Alignment , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Jan Flusser,et al.  Polygonal shape description for recognition of partially occluded objects , 2007, Pattern Recognit. Lett..

[16]  Takeo Kanade,et al.  The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[17]  Carlos A. B. Mello,et al.  A novel method for reconstructing degraded digits , 2012, 2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

[18]  Luming Zhang,et al.  Action2Activity: Recognizing Complex Activities from Sensor Data , 2015, IJCAI.

[19]  A. Murat Tekalp,et al.  Partial shape recognition by sub-matrix matching for partial matching guided image labeling , 2005, Pattern Recognit..

[20]  Na Fan Feature-Based Partially Occluded Object Recognition , 2010, 2010 20th International Conference on Pattern Recognition.

[21]  Shuicheng Yan,et al.  Robust LSTM-Autoencoders for Face De-Occlusion in the Wild , 2016, IEEE Transactions on Image Processing.

[22]  Hongxun Yao,et al.  Auto-encoder based dimensionality reduction , 2016, Neurocomputing.

[23]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[24]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[25]  Zhuowen Tu,et al.  Deeply-Supervised Nets , 2014, AISTATS.

[26]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Bo Zhang,et al.  Learning to Generate with Memory , 2016, ICML.

[28]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[29]  M. Kramer Nonlinear principal component analysis using autoassociative neural networks , 1991 .

[30]  Mengjie Zhang,et al.  Deep hybrid networks with good out-of-sample object recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  Luming Zhang,et al.  Fortune Teller: Predicting Your Career Path , 2016, AAAI.

[32]  Shiguang Shan,et al.  Stacked Progressive Auto-Encoders (SPAE) for Face Recognition Across Poses , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Jun Zhong,et al.  Towards unsupervised physical activity recognition using smartphone accelerometers , 2016, Multimedia Tools and Applications.

[34]  Li Liu,et al.  Recognizing Complex Activities by a Probabilistic Interval-Based Model , 2016, AAAI.

[35]  Xiaoou Tang,et al.  Image Super-Resolution Using Deep Convolutional Networks , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Yu Zheng,et al.  Urban Water Quality Prediction Based on Multi-Task Multi-View Learning , 2016, IJCAI.

[37]  Brendan J. Frey,et al.  k-Sparse Autoencoders , 2013, ICLR.

[38]  Xinlei Chen,et al.  PixelNet: Representation of the pixels, by the pixels, and for the pixels , 2017, ArXiv.

[39]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.