Integrating pixels and segments: A deep-learning method inspired by the informational diversity of the visual pathways

Abstract Visual cortex is able to process information in multiple pathways and integrate various forms of representations. This paper proposed a bio-inspired method that utilizes the line-segment-based representation to perform a dedicated channel for the geometric feature learning process. The extracted geometric information can be integrated with the original pixel-based information and implemented on both the convolutional neural networks (SegCNN) and the stacked autoencoders (SegSAE). Segment-based operations such as segConvolve and segPooling are designed to further process the extracted geometric features. The proposed models are verified on the MNIST dataset, Caltech 101 dataset and QuickDraw dataset for image classification. According to the experimental results, the proposed models can facilitate the classification accuracies especially when the sizes of the training set are limited. Particularly, the method based on multiple representations is found to be effective for classifying the hand-drawn sketches.

[1]  Hui Wei,et al.  A novel method for 2D nonrigid partial shape matching , 2018, Neurocomputing.

[2]  Rafael Grompone von Gioi,et al.  LSD: A Fast Line Segment Detector with a False Detection Control , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Jian Zhang,et al.  Convolutional Sparse Autoencoders for Image Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Bradford Z. Mahon,et al.  The Role of the Dorsal Visual Processing Stream in Tool Identification , 2010, Psychological science.

[5]  Christopher A. Buneo,et al.  Direct visuomotor transformations for reaching , 2002, Nature.

[6]  Ying Li,et al.  Simultaneous body part and motion identification for human-following robots , 2016, Pattern Recognit..

[7]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[8]  Yongsheng Ding,et al.  A sparse autoencoder compressed sensing method for acquiring the pressure array information of clothing , 2018, Neurocomputing.

[9]  Yongsheng Ding,et al.  A Novel Method Based on Line-Segment Visualizations for Hyper-Parameter Optimization in Deep Networks , 2018, Int. J. Pattern Recognit. Artif. Intell..

[10]  Ulrich Eckhardt,et al.  Shape descriptors for non-rigid shapes with a single closed contour , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Qiang Li,et al.  Learning and Representing Object Shape Through an Array of Orientation Columns , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  B. Sheth,et al.  Two Visual Pathways in Primates Based on Sampling of Space: Exploitation and Exploration of Visual Information , 2016, Front. Integr. Neurosci..

[15]  Liqing Zhang,et al.  Sketch-based Image Retrieval via Shape Words , 2015, ICMR.

[16]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Hui Wei,et al.  A segment-wise prediction based on genetic algorithm for object recognition , 2017, Neural Computing and Applications.

[18]  Xinbo Gao,et al.  Composite components-based face sketch recognition , 2018, Neurocomputing.

[19]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[20]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[21]  Yongsheng Ding,et al.  Using line segments to train multi-stream stacked autoencoders for image classification , 2017, Pattern Recognit. Lett..

[22]  Panos E. Trahanias,et al.  Computational modeling of cortical pathways involved in action execution and action observation , 2011, Neurocomputing.

[23]  Hui Wei,et al.  A Genetic-Algorithm-Based Explicit Description of Object Contour and its Ability to Facilitate Recognition , 2015, IEEE Transactions on Cybernetics.

[24]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[25]  Ming-Hsuan Yang,et al.  Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks , 2017, NIPS.

[26]  Richard G. Baraniuk,et al.  Learned D-AMP: Principled Neural Network based Compressive Image Recovery , 2017, NIPS.

[27]  M. Goodale,et al.  Separate visual pathways for perception and action , 1992, Trends in Neurosciences.

[28]  Alexander G. Schwing,et al.  MaskRNN: Instance Level Video Object Segmentation , 2018, NIPS.

[29]  Jitendra Malik,et al.  SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[30]  Y-Lan Boureau,et al.  Learning Convolutional Feature Hierarchies for Visual Recognition , 2010, NIPS.

[31]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[32]  Yoshiki Kashimori,et al.  A functional role of multiple spatial resolution maps in form perception along the ventral visual pathway , 2005, Neurocomputing.

[33]  Wenyu Liu,et al.  Bag of contour fragments for robust shape classification , 2014, Pattern Recognit..

[34]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[35]  Peijie Yin,et al.  A biologically inspired model mimicking the memory and two distinct pathways of face perception , 2016, Neurocomputing.