Instance-Based Neural Dependency Parsing

Abstract Interpretable rationales for model predictions are crucial in practical applications. We develop neural models that possess an interpretable inference process for dependency parsing. Our models adopt instance-based inference, where dependency edges are extracted and labeled by comparing them to edges in a training set. The training edges are explicitly used for the predictions; thus, it is easy to grasp the contribution of each edge to the predictions. Our experiments show that our instance-based models achieve competitive accuracy with standard neural models and have the reasonable plausibility of instance-based explanations.

[1]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[2]  Fabio Petroni,et al.  Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks , 2020, NeurIPS.

[3]  Geoffrey E. Hinton,et al.  Neighbourhood Components Analysis , 2004, NIPS.

[4]  Jianfeng Gao,et al.  Embedding Entities and Relations for Learning and Inference in Knowledge Bases , 2014, ICLR.

[5]  Yuanbin Wu,et al.  Graph-based Dependency Parsing with Graph Neural Networks , 2019, ACL.

[6]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[7]  Yichen Wei,et al.  Circle Loss: A Unified Perspective of Pair Similarity Optimization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Keinosuke Fukunaga,et al.  The optimal distance measure for nearest neighbor classification , 1981, IEEE Trans. Inf. Theory.

[9]  Khalil Sima'an,et al.  A memory-based model of syntactic analysis: data-oriented parsing , 1999, J. Exp. Theor. Artif. Intell..

[10]  Denali Molitor,et al.  Model Agnostic Supervised Local Explanations , 2018, NeurIPS.

[11]  Joakim Nivre,et al.  Dependency Parsing , 2009, Lang. Linguistics Compass.

[12]  Kentaro Inui,et al.  Instance-Based Learning of Span Representations: A Case Study through Named Entity Recognition , 2020, ACL.

[13]  Oriol Vinyals,et al.  Representation Learning with Contrastive Predictive Coding , 2018, ArXiv.

[14]  Makoto Nagao,et al.  A framework of a mechanical translation between Japanese and English by analogy principle , 1984 .

[15]  Prakhar Gupta,et al.  Learning Word Vectors for 157 Languages , 2018, LREC.

[16]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[17]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Joakim Nivre,et al.  An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing , 2018, EMNLP.

[19]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[21]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Rens Bod,et al.  From Exemplar to Grammar: A Probabilistic Analogy-Based Model of Language Learning , 2009, Cogn. Sci..

[23]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[24]  Philipp Koehn,et al.  Synthesis Lectures on Human Language Technologies , 2016 .

[25]  Quoc V. Le,et al.  Semi-Supervised Sequence Modeling with Cross-View Training , 2018, EMNLP.

[26]  David Gil,et al.  The World Atlas of Language Structures , 2005 .

[27]  Karl Stratos,et al.  Label-Agnostic Sequence Labeling by Copying Nearest Neighbors , 2019, ACL.

[28]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[29]  Erik F. Tjong Kim Sang,et al.  Memory-Based Named Entity Recognition , 2002, CoNLL.

[30]  Navdeep Jaitly,et al.  Hybrid speech recognition with Deep Bidirectional LSTM , 2013, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding.

[31]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[32]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[33]  Kihyuk Sohn,et al.  Improved Deep Metric Learning with Multi-class N-pair Loss Objective , 2016, NIPS.

[34]  Yunyao Li,et al.  K-SRL: Instance-based Learning for Semantic Role Labeling , 2016, COLING.

[35]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification and Regression , 1995, NIPS.

[36]  Walter Daelemans,et al.  MBT: A Memory-Based Part of Speech Tagger-Generator , 1996, VLC@COLING.

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[39]  Mirella Lapata,et al.  Dependency Parsing as Head Selection , 2016, EACL.

[40]  Fernando Pereira,et al.  Multilingual Dependency Analysis with a Two-Stage Discriminative Parser , 2006, CoNLL.

[41]  John David N. Dionisio,et al.  Case-based explanation of non-case-based learning methods , 1999, AMIA.

[42]  Alexandros Nanopoulos,et al.  Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data , 2010, J. Mach. Learn. Res..

[43]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[45]  Walter Daelemans,et al.  Memory-Based Word Sense Disambiguation , 2000, Comput. Humanit..

[46]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[47]  Walter Daelemans,et al.  Memory-Based Language Processing , 2009, Studies in natural language processing.

[48]  Joakim Nivre,et al.  An Efficient Algorithm for Projective Dependency Parsing , 2003, IWPT.

[49]  Min Zhang,et al.  Efficient Second-Order TreeCRF for Neural Dependency Parsing , 2020, ACL.

[50]  Jerome H. Friedman,et al.  Flexible Metric Nearest Neighbor Classification , 1994 .

[51]  Fernando Pereira,et al.  Non-Projective Dependency Parsing using Spanning Tree Algorithms , 2005, HLT.

[52]  Sandra Kübler Memory-Based Parsing , 2005, Computational Linguistics.

[53]  Timothy Dozat,et al.  Deep Biaffine Attention for Neural Dependency Parsing , 2016, ICLR.

[54]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[55]  Omer Levy,et al.  Generalization through Memorization: Nearest Neighbor Language Models , 2020, ICLR.

[56]  Iris Hendrickx,et al.  Memory-based one-step named-entity recognition: Effects of seed list features, classifier stacking, and unannotated data , 2003, CoNLL.

[57]  Carlos D. Castillo,et al.  L2-constrained Softmax Loss for Discriminative Face Verification , 2017, ArXiv.

[58]  Sho Yokoi,et al.  Evaluation of Similarity-based Explanations , 2021, ICLR.

[59]  Hitoshi Iida,et al.  Experiments and Prospects of Example-Based Machine Translation , 1991, ACL.

[60]  Michael Lebowitz,et al.  Memory-Based Parsing , 1983, Artif. Intell..

[61]  Janet L. Kolodner,et al.  Improving Human Decision Making through Case-Based Decision Aiding , 1991, AI Mag..

[62]  Marc Sebban,et al.  A Survey on Metric Learning for Feature Vectors and Structured Data , 2013, ArXiv.

[63]  L KolodnerJanet Improving human decision making through case-based decision aiding , 1991 .

[64]  Mike Lewis,et al.  Nearest Neighbor Machine Translation , 2020, ICLR.

[65]  Joakim Nivre,et al.  Memory-Based Dependency Parsing , 2004, CoNLL.

[66]  Yuji Matsumoto,et al.  Statistical Dependency Analysis with Support Vector Machines , 2003, IWPT.

[67]  Patrick D. McDaniel,et al.  Deep k-Nearest Neighbors: Towards Confident, Interpretable and Robust Deep Learning , 2018, ArXiv.

[68]  Joakim Nivre,et al.  Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited , 2019, EMNLP.

[69]  Xing Ji,et al.  CosFace: Large Margin Cosine Loss for Deep Face Recognition , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[70]  Markus Schedl,et al.  Local and global scaling reduce hubs in space , 2012, J. Mach. Learn. Res..

[71]  Joakim Nivre,et al.  Old School vs. New School: Comparing Transition-Based Parsers with and without Neural Network Enhancement , 2017, TLT.

[72]  Lorenzo Rosasco,et al.  Holographic Embeddings of Knowledge Graphs , 2015, AAAI.

[73]  Walter Daelemans,et al.  Memory-Based Named Entity Recognition using Unannotated Data , 2003, CoNLL.

[74]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[75]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[76]  Ming-Wei Chang,et al.  REALM: Retrieval-Augmented Language Model Pre-Training , 2020, ICML.

[77]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.