Tree Edit Distance Learning via Adaptive Symbol Embeddings

Metric learning has the aim to improve classification accuracy by learning a distance measure which brings data points from the same class closer together and pushes data points from different classes further apart. Recent research has demonstrated that metric learning approaches can also be applied to trees, such as molecular structures, abstract syntax trees of computer programs, or syntax trees of natural language, by learning the cost function of an edit distance, i.e. the costs of replacing, deleting, or inserting nodes in a tree. However, learning such costs directly may yield an edit distance which violates metric axioms, is challenging to interpret, and may not generalize well. In this contribution, we propose a novel metric learning approach for trees which learns an edit distance indirectly by embedding the tree nodes as vectors, such that the Euclidean distance between those vectors supports class discrimination. We learn such embeddings by reducing the distance to prototypical trees from the same class and increasing the distance to prototypical trees from different classes. In our experiments, we show that our proposed metric learning approach improves upon the state-of-the-art in metric learning for trees on six benchmark data sets, ranging from computer science over biomedical data to a natural-language processing data set containing over 300,000 nodes.

[1]  Anthony K. H. Tung,et al.  Comparing Stars: On Approximating Graph Edit Distance , 2009, Proc. VLDB Endow..

[2]  Kaizhong Zhang,et al.  Simple Fast Algorithms for the Editing Distance Between Trees and Related Problems , 1989, SIAM J. Comput..

[3]  Brian Kulis,et al.  Metric Learning: A Survey , 2013, Found. Trends Mach. Learn..

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Alessio Micheli,et al.  Adaptive Contextual Processing of Structured Data by Recursive Neural Networks: A Survey of Computational Properties , 2007, Perspectives of Neural-Symbolic Integration.

[6]  Thomas Villmann,et al.  Median variants of learning vector quantization for learning of dissimilarity data , 2015, Neurocomputing.

[7]  Frank-Michael Schleif,et al.  Metric learning for sequences in relational LVQ , 2015, Neurocomputing.

[8]  Benjamin Paaßen Java Sorting Programs , 2016 .

[9]  Kiyoko F. Aoki-Kinoshita,et al.  KEGG as a glycome informatics resource. , 2006, Glycobiology.

[10]  Thomas Villmann,et al.  Regularization in Matrix Relevance Learning , 2010, IEEE Transactions on Neural Networks.

[11]  Kurt Mehlhorn,et al.  Weisfeiler-Lehman Graph Kernels , 2011, J. Mach. Learn. Res..

[12]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[13]  Kaizhong Zhang,et al.  On the Editing Distance Between Unordered Labeled Trees , 1992, Inf. Process. Lett..

[14]  Benjamin Paaßen,et al.  Revisiting the tree edit distance and its backtracing: A tutorial , 2018, ArXiv.

[15]  Alessandro Sperduti,et al.  An Efficient Topological Distance-Based Tree Kernel , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[16]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[17]  Claudio Gallicchio,et al.  Tree Echo State Networks , 2013, Neurocomputing.

[18]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[19]  Barbara Hammer,et al.  Parametric nonlinear dimensionality reduction using kernel t-SNE , 2015, Neurocomputing.

[20]  Barbara Hammer,et al.  The Continuous Hint Factory - Providing Hints in Vast and Sparsely Populated Edit Distance Spaces , 2017, ArXiv.

[21]  Alessandro Sperduti,et al.  Mining Structured Data , 2010, IEEE Computational Intelligence Magazine.

[22]  Marc Sebban,et al.  Learning Metrics Between Tree Structured Data: Application to Image Recognition , 2007, ECML.

[23]  Fabio Aiolli,et al.  EasyMKL: a scalable multiple kernel learning algorithm , 2015, Neurocomputing.

[24]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[25]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[26]  Michael Biehl,et al.  Adaptive Relevance Matrices in Learning Vector Quantization , 2009, Neural Computation.

[27]  Thomas Villmann,et al.  Stationarity of Matrix Relevance LVQ , 2015, 2015 International Joint Conference on Neural Networks (IJCNN).

[28]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[29]  Marc Sebban,et al.  Good edit similarity learning by loss minimization , 2012, Machine Learning.

[30]  Peter Tiño,et al.  Indefinite Proximity Learning: A Review , 2015, Neural Computation.

[31]  Pinar Yanardag,et al.  Deep Graph Kernels , 2015, KDD.

[32]  Atsushi Sato,et al.  Generalized Learning Vector Quantization , 1995, NIPS.

[33]  Maria-Florina Balcan,et al.  On a theory of learning with similarity functions , 2006, ICML.

[34]  Davide Bacciu,et al.  Generative Kernels for Tree-Structured Data , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Marc Sebban,et al.  Learning discriminative tree edit similarities for linear classification - Application to melody recognition , 2016, Neurocomputing.

[36]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[37]  Barbara Hammer,et al.  Adaptive structure metrics for automated feedback provision in intelligent tutoring systems , 2016, Neurocomputing.

[38]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.