Fast Computation of Subpath Kernel for Trees

The kernel method is a potential approach to analyzing structured data such as sequences, trees, and graphs; however, unordered trees have not been investigated extensively. Kimura et al. (2011) proposed a kernel function for unordered trees on the basis of their subpaths, which are vertical substructures of trees responsible for hierarchical information in them. Their kernel exhibits practically good performance in terms of accuracy and speed; however, linear-time computation is not guaranteed theoretically, unlike the case of the other unordered tree kernel proposed by Vishwanathan and Smola (2003). In this paper, we propose a theoretically guaranteed linear-time kernel computation algorithm that is practically fast, and we present an efficient prediction algorithm whose running time depends only on the size of the input tree. Experimental results show that the proposed algorithms are quite efficient in practice.

[1]  Nello Cristianini,et al.  Classification using String Kernels , 2000 .

[2]  Michael A. Bender,et al.  The Level Ancestor Problem Simplified , 2002, LATIN.

[3]  Hisashi Kashima,et al.  A Subpath Kernel for Rooted Unordered Trees , 2011 .

[4]  Choon Hui Teo,et al.  Fast and space efficient string kernels using suffix arrays , 2006, ICML.

[5]  D. Marcu,et al.  A Tree-Position Kernel for Document Compression , 2004 .

[6]  Peter Sanders,et al.  Simple Linear Work Suffix Array Construction , 2003, ICALP.

[7]  Enno Ohlebusch,et al.  Replacing suffix trees with enhanced suffix arrays , 2004, J. Discrete Algorithms.

[8]  Alessandro Moschitti,et al.  Fast Support Vector Machines for Structural Kernels , 2011, ECML/PKDD.

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Michael Collins,et al.  Convolution Kernels for Natural Language , 2001, NIPS.

[11]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications , 2001, CPM.

[12]  Fabrizio Luccio,et al.  Structuring labeled trees for optimal succinctness, and beyond , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[13]  Hisashi Kashima,et al.  Kernels for Semi-Structured Data , 2002, ICML.

[14]  Charu C. Aggarwal,et al.  XRules: an effective structural classifier for XML data , 2003, KDD '03.

[15]  Susumu Goto,et al.  GLYCAN: The Database of Carbohydrate Structures , 2003 .

[16]  Robert Sedgewick,et al.  Fast algorithms for sorting and searching strings , 1997, SODA '97.

[17]  Alessandro Moschitti,et al.  Efficient Convolution Kernels for Dependency and Constituent Syntactic Trees , 2006, ECML.

[18]  Jun Sun,et al.  Tree Sequence Kernel for Natural Language , 2011, AAAI.

[19]  Thomas Gärtner,et al.  On Graph Kernels: Hardness Results and Efficient Alternatives , 2003, COLT.

[20]  Hisashi Kashima Machine learning approaches for structured data , 2007 .

[21]  Hiroki Arimura,et al.  Linear-Time Longest-Common-Prex Computation in Sux Arrays and Its Applications , 2001 .

[22]  Eleazar Eskin,et al.  The Spectrum Kernel: A String Kernel for SVM Protein Classification , 2001, Pacific Symposium on Biocomputing.

[23]  Hiroshi Yasuda,et al.  A gram distribution kernel applied to glycan classification and motif extraction. , 2006, Genome informatics. International Conference on Genome Informatics.

[24]  Hisashi Kashima,et al.  Marginalized Kernels Between Labeled Graphs , 2003, ICML.

[25]  Alexander J. Smola,et al.  Fast Kernels for String and Tree Matching , 2002, NIPS.

[26]  David Haussler,et al.  Convolution kernels on discrete structures , 1999 .

[27]  Alessandro Sperduti,et al.  Route kernels for trees , 2009, ICML '09.

[28]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[29]  Charu C. Aggarwal,et al.  XRules: An effective algorithm for structural classification of XML data , 2006, Machine Learning.

[30]  Tetsuo Shibuya Constructing the Suffix Tree of a Tree with a Large Alphabet , 1999, ISAAC.

[31]  R. Dwek,et al.  Glycobiology , 2018, Biochimie.