Pyramidal Stochastic Graphlet Embedding for Document Pattern Classification

Document pattern classification methods using graphs have received a lot of attention because of its robust representation paradigm and rich theoretical background. However, the way of preserving and the process for delineating documents with graphs introduce noise in the rendition of underlying data, which creates instability in the graph representation. To deal with such unreliability in representation, in this paper, we propose Pyramidal Stochastic Graphlet Embedding (PSGE). Given a graph representing a document pattern, our method first computes a graph pyramid by successively reducing the base graph. Once the graph pyramid is computed, we apply Stochastic Graphlet Embedding (SGE) for each level of the pyramid and combine their embedded representation to obtain a global delineation of the original graph. The consideration of pyramid of graphs rather than just a base graph extends the representational power of the graph embedding, which reduces the instability caused due to noise and distortion. When plugged with support vector machine, our proposed PSGE has outperformed the state-of-the-art results in recognition of handwritten words as well as graphical symbols.

[1]  Alicia Fornés,et al.  Large-scale graph indexing using binary embeddings of node contexts for information spotting in document image databases , 2017, Pattern Recognit. Lett..

[2]  M E J Newman,et al.  Community structure in social and biological networks , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Alicia Fornés,et al.  A Novel Learning-Free Word Spotting Approach Based on Graph Representation , 2014, 2014 11th IAPR International Workshop on Document Analysis Systems.

[4]  Hichem Sahbi,et al.  High Order Stochastic Graphlet Embedding for Graph-Based Pattern Recognition , 2017, ArXiv.

[5]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[6]  Kaspar Riesen,et al.  Graph Similarity Features for HMM-Based Handwriting Recognition in Historical Documents , 2010, 2010 12th International Conference on Frontiers in Handwriting Recognition.

[7]  P. Héroux,et al.  Frequent Graph Discovery: Application to Line Drawing Document Images , 2005 .

[8]  Jean-Yves Ramel,et al.  Spotting Symbols in Line Drawing Images Using Graph Representations , 2007, GREC.

[9]  Jean-Yves Ramel,et al.  Fuzzy multilevel graph embedding , 2013, Pattern Recognit..

[10]  Umapada Pal,et al.  Bag-of-GraphPaths Descriptors for Symbol Recognition and Spotting in Line Drawings , 2011, GREC.

[11]  Alicia Fornés,et al.  Handwritten word spotting by inexact matching of grapheme graphs , 2015, 2015 13th International Conference on Document Analysis and Recognition (ICDAR).

[12]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Ernest Valveny,et al.  Graph embedding in vector spaces by node attribute statistics , 2012, Pattern Recognit..

[14]  Kaspar Riesen,et al.  A Novel Graph Database for Handwritten Word Images , 2016, S+SSPR.

[15]  Josep Lladós,et al.  Symbol Recognition by Error-Tolerant Subgraph Matching between Region Adjacency Graphs , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Leonard M. Freeman,et al.  A set of measures of centrality based upon betweenness , 1977 .

[17]  Umapada Pal,et al.  Bag-of-GraphPaths for Symbol Recognition and Spotting in Line Drawings , 2011 .

[18]  Kaspar Riesen,et al.  Improving vector space embedding of graphs through feature selection algorithms , 2011, Pattern Recognit..

[19]  Marcal Rusi Relational Indexing of Vectorial Primitives for Symbol Spotting in Line-Drawing Images , 2009 .

[20]  Yves Lecourtier,et al.  An integer linear program for substitution-tolerant subgraph isomorphism and its use for symbol spotting in technical drawings , 2012, Pattern Recognit..

[21]  Umapada Pal,et al.  A symbol spotting approach in graphical documents by hashing serialized graphs , 2013, Pattern Recognit..

[22]  Josep Lladós,et al.  Hierarchical Graph Representation for Symbol Spotting in Graphical Document Images , 2012, SSPR/SPR.

[23]  E. Ordentlich,et al.  Inequalities for the L1 Deviation of the Empirical Distribution , 2003 .

[24]  P. Hespanha,et al.  An Efficient MATLAB Algorithm for Graph Partitioning , 2006 .

[25]  Shin'ichi Satoh,et al.  Compact correlation coding for visual object categorization , 2011, 2011 International Conference on Computer Vision.

[26]  Francesc Comellas,et al.  Reconstruction of Networks from Their Betweenness Centrality , 2008, EvoWorkshops.

[27]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[28]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[29]  Mark E. J. Newman A measure of betweenness centrality based on random walks , 2005, Soc. Networks.

[30]  Kaspar Riesen,et al.  IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning , 2008, SSPR/SPR.