Building Hierarchical Representations for Oracle Character and Sketch Recognition

In this paper, we study oracle character recognition and general sketch recognition. First, a data set of oracle characters, which are the oldest hieroglyphs in China yet remain a part of modern Chinese characters, is collected for analysis. Second, typical visual representations in shape- and sketch-related works are evaluated. We analyze the problems suffered when addressing these representations and determine several representation design criteria. Based on the analysis, we propose a novel hierarchical representation that combines a Gabor-related low-level representation and a sparse-encoder-related mid-level representation. Extensive experiments show the effectiveness of the proposed representation in both oracle character recognition and general sketch recognition. The proposed representation is also complementary to convolutional neural network (CNN)-based models. We introduce a solution to combine the proposed representation with CNN-based models, and achieve better performances over both approaches. This solution has beaten humans at recognizing general sketches.

[1]  Liqing Zhang,et al.  Edgel index for large-scale sketch-based image search , 2011, CVPR 2011.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[4]  Gunilla Borgefors,et al.  Hierarchical Chamfer Matching: A Parametric Edge Matching Algorithm , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  C. Lawrence Zitnick,et al.  Structured Forests for Fast Edge Detection , 2013, 2013 IEEE International Conference on Computer Vision.

[6]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[7]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Jitendra Malik,et al.  Matching Shapes , 2001, ICCV.

[9]  Jean-Marc Odobez,et al.  Analyzing Ancient Maya Glyph Collections with Contextual Shape Descriptors , 2011, International Journal of Computer Vision.

[10]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  C. Lawrence Zitnick,et al.  Binary Coherent Edge Descriptors , 2010, ECCV.

[12]  Alexei A. Efros,et al.  Data-driven visual similarity for cross-domain image matching , 2011, ACM Trans. Graph..

[13]  Marc'Aurelio Ranzato,et al.  Efficient Learning of Sparse Representations with an Energy-Based Model , 2006, NIPS.

[14]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[15]  Marc Alexa,et al.  How do humans sketch objects? , 2012, ACM Trans. Graph..

[16]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[17]  Trevor Darrell,et al.  Recognizing Image Style , 2013, BMVC.

[18]  Liqing Zhang,et al.  MindFinder: interactive sketch-based image search on millions of images , 2010, ACM Multimedia.

[19]  Tao Wang,et al.  End-to-end text recognition with convolutional neural networks , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[20]  Andrew Zisserman,et al.  In Search of Art , 2014, ECCV Workshops.

[21]  Mathieu Aubry,et al.  Painting-to-3D model alignment via discriminative visual elements , 2014, TOGS.

[22]  Thomas S. Huang,et al.  Supervised translation-invariant sparse coding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[23]  David Finkelstein Handbook of Research on Writing , 2007 .

[24]  Liu Yi-feng,et al.  Graphs,Words,and Meanings:Three Reference Works for Shang Oracle-Bone Studies , 2007 .

[25]  Björn Stenger,et al.  Model-based hand tracking using a hierarchical Bayesian filter , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[27]  Marc Alexa,et al.  Sketch-Based Image Retrieval: Benchmark and Bag-of-Features Descriptors , 2011, IEEE Transactions on Visualization and Computer Graphics.

[28]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[29]  F. G. D. Matos Handbook of Research on Writing: History, Society, School, Individual, Text , 2008 .

[30]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[31]  Yoshua Bengio,et al.  Greedy Layer-Wise Training of Deep Networks , 2006, NIPS.

[32]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[33]  Andrew Zisserman,et al.  The State of the Art: Object Retrieval in Paintings using Discriminative Regions , 2014, BMVC.

[34]  Ilan Shimshoni,et al.  Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[35]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[36]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[37]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[39]  Rowan K. Flad Divination and Power , 2008, Current Anthropology.

[40]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[42]  Thomas S. Huang,et al.  Coupled Dictionary Training for Image Super-Resolution , 2012, IEEE Transactions on Image Processing.

[43]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[44]  Ching Y. Suen,et al.  Thinning Methodologies - A Comprehensive Survey , 1992, IEEE Trans. Pattern Anal. Mach. Intell..

[45]  Changhu Wang,et al.  Indexing billions of images for sketch-based retrieval , 2013, ACM Multimedia.

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[48]  Matthijs Douze,et al.  Searching in one billion vectors: Re-rank with source coding , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[49]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.