Few-Shot Sequence Labeling with Label Dependency Transfer

Few-shot sequence labeling faces a unique challenge compared with the other fewshot classification problems, owing to the necessity for modeling the dependencies between labels. Different domains often have different label sets, which makes it difficult to directly utilize the label dependencies learned from one domain in another domain. In this paper, we introduce the dependency transfer mechanism that addresses such label-discrepancy problem. The dependency transfer mechanism learns the abstract label transition patterns from the source domains and generalizes such patterns in the target domain to benefit the prediction of a label sequence. We also develop the sequence matching network by adapting the matching network to sequence labeling case. Moreover, we propose a CRF-based few-shot sequence labeling framework to integrate both the dependency transfer mechanism and the sequence matching network. Experiments on slot tagging (ST) and named entity recognition (NER) datasets show that our model significantly outperforms the strongest few-shot learning baseline by 7.96 and 11.70 F1 scores respectively in the 1-shot setting.

[1]  Iryna Gurevych,et al.  Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging , 2017, EMNLP.

[2]  Paul A. Viola,et al.  Learning from one example through shared densities on transforms , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[3]  Leon Derczynski,et al.  Results of the WNUT2017 Shared Task on Novel and Emerging Entity Recognition , 2017, NUT@EMNLP.

[4]  Pietro Perona,et al.  One-shot learning of object categories , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Amir Zeldes,et al.  The GUM corpus: creating multilayer resources in the classroom , 2016, Language Resources and Evaluation.

[6]  Francesco Caltagirone,et al.  Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces , 2018, ArXiv.

[7]  Joshua B. Tenenbaum,et al.  Human-level concept learning through probabilistic program induction , 2015, Science.

[8]  Varvara Logacheva,et al.  Few-shot classification in named entity recognition task , 2018, SAC.

[9]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[10]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[11]  Michael Fink,et al.  Object Classification from a Single Example Utilizing Class Relevance Metrics , 2004, NIPS.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Li Fei-Fei Knowledge transfer in learning to recognize visual objects classes , 2006 .

[14]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[15]  Hwee Tou Ng,et al.  Towards Robust Linguistic Analysis using OntoNotes , 2013, CoNLL.

[16]  Rahul Jha,et al.  Bag of Experts Architectures for Model Reuse in Conversational Language Understanding , 2018, NAACL-HLT.

[17]  Yu Cheng,et al.  Diverse Few-Shot Text Classification with Multiple Metrics , 2018, NAACL.

[18]  Tao Xiang,et al.  Learning to Compare: Relation Network for Few-Shot Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Ruslan Salakhutdinov,et al.  Transfer Learning for Sequence Tagging with Hierarchical Recurrent Networks , 2016, ICLR.

[20]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[21]  Fei-FeiLi,et al.  One-Shot Learning of Object Categories , 2006 .

[22]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[23]  Jr. G. Forney,et al.  The viterbi algorithm , 1973 .

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[26]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[27]  Jie Cao,et al.  Few-shot learning for short text classification , 2018, Multimedia Tools and Applications.

[28]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[29]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[30]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[31]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[32]  Alexandre Lacoste,et al.  TADAM: Task dependent adaptive metric for improved few-shot learning , 2018, NeurIPS.

[33]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[34]  Lawrence D. Jackel,et al.  Backpropagation Applied to Handwritten Zip Code Recognition , 1989, Neural Computation.

[35]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.