Metric Learning for Ordered Labeled Trees with pq-grams

Computing the similarity between two data points plays a vital role in many machine learning algorithms. Metric learning has the aim of learning a good metric automatically from data. Most existing studies on metric learning for tree-structured data have adopted the approach of learning the tree edit distance. However, the edit distance is not amenable for big data analysis because it incurs high computation cost. In this paper, we propose a new metric learning approach for tree-structured data with pq-grams. The pq-gram distance is a distance for ordered labeled trees, and has much lower computation cost than the tree edit distance. In order to perform metric learning based on pq-grams, we propose a new differentiable parameterized distance, weighted pq-gram distance. We also propose a way to learn the proposed distance based on Large Margin Nearest Neighbors (LMNN), which is a well-studied and practical metric learning scheme. We formulate the metric learning problem as an optimization problem and use the gradient descent technique to perform metric learning. We empirically show that the proposed approach not only achieves competitive results with the state-of-the-art edit distance-based methods in various classification problems, but also solves the classification problems much more rapidly than the edit distance-based methods.

[1]  Alberto H. F. Laender,et al.  Automatic web news extraction using tree edit distance , 2004, WWW '04.

[2]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[3]  Andrew McCallum,et al.  A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance , 2005, UAI.

[4]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[5]  Marc Sebban,et al.  Learning stochastic edit distance: Application in handwritten character recognition , 2006, Pattern Recognit..

[6]  Yiming Ying,et al.  Support Vector Machine Soft Margin Classifiers: Error Analysis , 2004, J. Mach. Learn. Res..

[7]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[8]  P. Albersheim,et al.  Letter to the Glyco-Forum CarbBank , 1992 .

[9]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[10]  Kuo-Chung Tai,et al.  The Tree-to-Tree Correction Problem , 1979, JACM.

[11]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[12]  Alessandro Moschitti,et al.  Making Tree Kernels Practical for Natural Language Learning , 2006, EACL.

[13]  Michael H. Böhlen,et al.  The pq-gram distance between ordered labeled trees , 2010, TODS.

[14]  Erik D. Demaine,et al.  An optimal decomposition algorithm for tree edit distance , 2006, TALG.

[15]  Akihiro Yamamoto,et al.  Tree PCA for Extracting Dominant Substructures from Labeled Rooted Trees , 2015, Discovery Science.

[16]  Thomas Gärtner,et al.  A survey of kernels for structured data , 2003, SKDD.

[17]  Benjamin Paaßen,et al.  Tree Edit Distance Learning via Adaptive Symbol Embeddings , 2018, ICML.

[18]  P. Albersheim,et al.  Letter to the Glyco-Forum , 1991 .

[19]  Jun Wang,et al.  A metric learning perspective of SVM: on the relation of LMNN and SVM , 2012, AISTATS.

[20]  Benjamin Paaßen,et al.  Revisiting the tree edit distance and its backtracing: A tutorial , 2018, ArXiv.

[21]  Bin Ma,et al.  A General Edit Distance between RNA Structures , 2002, J. Comput. Biol..

[22]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[23]  K. Ming Leung,et al.  Learning Vector Quantization , 2017, Encyclopedia of Machine Learning and Data Mining.

[24]  Frank-Michael Schleif,et al.  Metric learning for sequences in relational LVQ , 2015, Neurocomputing.

[25]  Thomas Villmann,et al.  Median variants of learning vector quantization for learning of dissimilarity data , 2015, Neurocomputing.

[26]  Stephen Tyree,et al.  Non-linear Metric Learning , 2012, NIPS.

[27]  Kiyoko F. Aoki-Kinoshita,et al.  KEGG as a glycome informatics resource. , 2006, Glycobiology.

[28]  Erik D. Demaine,et al.  An Optimal Decomposition Algorithm for Tree Edit Distance , 2007, ICALP.

[29]  Maria-Florina Balcan,et al.  Improved Guarantees for Learning via Similarity Functions , 2008, COLT.

[30]  Yoshihiro Yamanishi,et al.  Glycan classification with tree kernels , 2007, Bioinform..

[31]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[32]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[33]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[34]  Peter Tiño,et al.  Indefinite Proximity Learning: A Review , 2015, Neural Computation.

[35]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[36]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[37]  Marc Sebban,et al.  Good edit similarity learning by loss minimization , 2012, Machine Learning.

[38]  Kilian Q. Weinberger,et al.  Fast solvers and efficient implementations for distance metric learning , 2008, ICML '08.

[39]  Hiroshi Yasuda,et al.  A spectrum tree kernel (論文特集:データマイニングと統計数理) , 2007 .