Recent advances in graph-based pattern recognition with applications in document analysis

Graphs are a powerful and popular representation formalism in pattern recognition. Particularly in the field of document analysis they have found widespread application. From the formal point of view, however, graphs are quite limited in the sense that the majority of mathematical operations needed to build common algorithms, such as classifiers or clustering schemes, are not defined. Consequently, we observe a severe lack of algorithmic procedures that can directly be applied to graphs. There exists recent work, however, aimed at overcoming these limitations. The present paper first provides a review of the use of graph representations in document analysis. Then we discuss a number of novel approaches suitable for making tools from statistical pattern recognition available to graphs. These novel approaches include graph kernels and graph embedding. With several experiments, using different data sets from the field of document analysis, we show that the new methods have great potential to outperform traditional procedures applied to graph representations.

[1]  Theodosios Pavlidis,et al.  A Shape Analysis Model with Applications to a Character Recognition System , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Hiromichi Fujisawa,et al.  Machine Learning in Document Analysis and Recognition , 2008, Studies in Computational Intelligence.

[3]  David G. Stork,et al.  Pattern Classification , 1973 .

[4]  George Nagy,et al.  Twenty Years of Document Image Analysis in PAMI , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Daniel P. Lopresti,et al.  A fast technique for comparing graph representations with applications to performance evaluation , 2003, Document Analysis and Recognition.

[6]  Ernest Valveny,et al.  Report on the Second Symbol Recognition Contest , 2005, GREC.

[7]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[8]  Kaspar Riesen,et al.  Graph Classification Based on Vector Space Embedding , 2009, Int. J. Pattern Recognit. Artif. Intell..

[9]  Nello Cristianini,et al.  Learning Semantic Similarity , 2002, NIPS.

[10]  Alexander J. Smola,et al.  Advances in Large Margin Classifiers , 2000 .

[11]  Stanley M. Selkow,et al.  The Tree-to-Tree Editing Problem , 1977, Inf. Process. Lett..

[12]  Ching Y. Suen,et al.  Hierarchical attributed graph representation and recognition of handwritten chinese characters , 1991, Pattern Recognit..

[13]  Horst Bunke,et al.  Inexact graph matching for structural pattern recognition , 1983, Pattern Recognit. Lett..

[14]  Horst Bunke,et al.  Clustering and error-Correcting Matching of graphs for Learning and Recognition of Symbols in Engineering Drawings , 1996, DAS.

[15]  Abraham Kandel,et al.  Graph-Theoretic Techniques for Web Content Mining , 2005, Series in Machine Perception and Artificial Intelligence.

[16]  Kaspar Riesen,et al.  Reducing the dimensionality of dissimilarity space embedding graph kernels , 2009, Eng. Appl. Artif. Intell..

[17]  Horst Bunke,et al.  Transforming Strings to Vector Spaces Using Prototype Selection , 2006, SSPR/SPR.

[18]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[19]  Lawrence B. Holder,et al.  Mining Graph Data , 2006 .

[20]  Tatsuya Akutsu,et al.  Graph Kernels for Molecular Structure-Activity Relationship Analysis with Support Vector Machines , 2005, J. Chem. Inf. Model..

[21]  Bidyut B. Chaudhuri Digital Document Processing , 2007 .

[22]  Horst Bunke,et al.  Bridging the Gap between Graph Edit Distance and Kernel Machines , 2007, Series in Machine Perception and Artificial Intelligence.

[23]  Kaspar Riesen,et al.  Fast Suboptimal Algorithms for the Computation of Graph Edit Distance , 2006, SSPR/SPR.

[24]  Horst Bunke,et al.  Syntactic and structural pattern recognition : theory and applications , 1990 .

[25]  Bernhard Schölkopf,et al.  Learning with kernels , 2001 .

[26]  Theodosios Pavlidis,et al.  Character Recognition Without Segmentation , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Josep Lladós,et al.  Symbol Recognition by Error-Tolerant Subgraph Matching between Region Adjacency Graphs , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[28]  Terry Caelli,et al.  Inexact Graph Matching Using Eigen-Subspace Projection Clustering , 2004, Int. J. Pattern Recognit. Artif. Intell..

[29]  Edwin R. Hancock,et al.  Spectral embedding of graphs , 2003, Pattern Recognit..

[30]  Jin Hyung Kim,et al.  Translation-, Rotation- and Scale- Invariant Recognition of Hand-Drawn Symbols in Schematic Diagrams , 1990, Int. J. Pattern Recognit. Artif. Intell..

[31]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..

[32]  Kaspar Riesen,et al.  IAM Graph Database Repository for Graph Based Pattern Recognition and Machine Learning , 2008, SSPR/SPR.

[33]  Robert P. W. Duin,et al.  The Dissimilarity Representation for Pattern Recognition - Foundations and Applications , 2005, Series in Machine Perception and Artificial Intelligence.

[34]  Thomas Gärtner,et al.  Kernels for structured data , 2008, Series in Machine Perception and Artificial Intelligence.

[35]  John D. Lafferty,et al.  Diffusion Kernels on Graphs and Other Discrete Input Spaces , 2002, ICML.

[36]  Fan Chung,et al.  Spectral Graph Theory , 1996 .

[37]  Abraham Kandel,et al.  Classification Of Web Documents Using Graph Matching , 2004, Int. J. Pattern Recognit. Artif. Intell..

[38]  Bernard Haasdonk,et al.  Feature space interpretation of SVMs with indefinite kernels , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39]  Horst Bunke,et al.  A Graph-Theoretic Approach to Enterprise Network Dynamics (Progress in Computer Science and Applied Logic (PCS)) , 2006 .

[40]  Horst Bunke On the generative power of sequential and parallel programmed graph grammars , 2005, Computing.

[41]  John D. Lafferty,et al.  Diffusion Kernels on Statistical Manifolds , 2005, J. Mach. Learn. Res..

[42]  M. Armon Rahgozar Document Table Recognition by Graph Rewriting , 1999, AGTIVE.

[43]  Kaspar Riesen,et al.  Non-linear Transformations of Vector Space Embedded Graphs , 2008, PRIS.

[44]  Giovanni Soda,et al.  Using tree-grammars for training set expansion in page classi .cation , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[45]  Hans-Peter Kriegel,et al.  Protein function prediction via graph kernels , 2005, ISMB.

[46]  Horst Bunke,et al.  On Median Graphs: Properties, Algorithms, and Applications , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[47]  Ching Y. Suen,et al.  Character Recognition Systems: A Guide for Students and Practitioners , 2007 .

[48]  Alexander J. Smola,et al.  Kernels and Regularization on Graphs , 2003, COLT.

[49]  Hans-Peter Kriegel,et al.  Shortest-path kernels on graphs , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[50]  Christine Solnon,et al.  Reactive Tabu Search for Measuring Graph Similarity , 2005, GbRPR.

[51]  John D. Lafferty,et al.  Information Diffusion Kernels , 2002, NIPS.

[52]  Naoki Asada,et al.  Graph grammar based analysis system of complex table form document , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[53]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[54]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[55]  Bernhard Schölkopf,et al.  Dynamic Alignment Kernels , 2000 .

[56]  Hong Yan,et al.  Recognition of handprinted Chinese characters by constrained graph matching , 1998, Image Vis. Comput..

[57]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[58]  Kaspar Riesen,et al.  Kernel k-Means Clustering Applied to Vector Space Embeddings of Graphs , 2008, ANNPR.

[59]  Toyohide Watanabe,et al.  Layout Recognition of Multi-Kinds of Table-Form Documents , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[60]  Edwin R. Hancock,et al.  Pattern Vectors from Algebraic Graph Theory , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[61]  Edwin R. Hancock,et al.  A Riemannian approach to graph embedding , 2007, Pattern Recognit..

[62]  Kaspar Riesen,et al.  Approximate graph edit distance computation by means of bipartite graph matching , 2009, Image Vis. Comput..

[63]  Josep Lladós,et al.  Graph Matching Versus Graph Parsing In Graphics Recognition - A Combined Approach , 2004, Int. J. Pattern Recognit. Artif. Intell..

[64]  Horst Bunke Attributed Programmed Graph Grammars and Their Application to Schematic Diagram Interpretation , 1982, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  P. Foggia,et al.  Fast graph matching for detecting CAD image components , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[66]  David S. Doermann,et al.  Logical Labeling of Document Images Using Layout Graph Matching with Adaptive Learning , 2002, Document Analysis Systems.

[67]  Pierre Baldi,et al.  Graph kernels for chemical informatics , 2005, Neural Networks.

[68]  Edwin R. Hancock,et al.  Structural Graph Matching Using the EM Algorithm and Singular Value Decomposition , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[69]  Jun Gu,et al.  A Constrained Approach to Multifont Chinese Character Recognition , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[70]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[71]  W. Wallis,et al.  A Graph-Theoretic Approach to Enterprise Network Dynamics , 2006 .

[72]  Daniel P. Lopresti,et al.  Document Analysis Systems V , 2002, Lecture Notes in Computer Science.

[73]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[74]  V. Srinivasa Chakravarthy,et al.  The shape of handwritten characters , 2003, Pattern Recognit. Lett..

[75]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[76]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.