Enhanced prototypical network for few-shot relation extraction

Abstract Most existing methods for relation extraction tasks depend heavily on large-scale annotated data; they cannot learn from existing knowledge and have low generalization ability. It is urgent for us to solve the above problems by further developing few-shot learning methods. Because of the limitations of the most commonly used CNN model which is not good at sequence labeling and capturing long-range dependencies, we proposed a novel model that integrates the transformer model into a prototypical network for more powerful relation-level feature extraction. The transformer connects tokens directly to adapt to long sequence learning without catastrophic forgetting and is able to gain more enhanced semantic information by learning from several representation subspaces in parallel for each word. We evaluate our method on three tasks, including in-domain, cross-domain and cross-sentence tasks. Our method achieves a trade-off between performance and computation and has an approximately 8% improvement in different settings over the state-of-the-art prototypical network. In addition, our experiments also show that our approach is competitive when considering cross-domain transfer and cross-sentence relation extraction in few-shot learning methods.

[1]  Matthias Grossglauser,et al.  Subspace Networks for Few-shot Classification , 2019, ArXiv.

[2]  Will Williams,et al.  Texture Bias Of CNNs Limits Few-Shot Classification Performance , 2019, ArXiv.

[3]  Richard S. Zemel,et al.  Prototypical Networks for Few-shot Learning , 2017, NIPS.

[4]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[5]  Jiechao Guan,et al.  Domain-Adaptive Few-Shot Learning , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[6]  Ebrahim Bagheri,et al.  Feature-enriched matrix factorization for relation extraction , 2019, Inf. Process. Manag..

[7]  Yongbin Liu,et al.  Ensemble method to joint inference for knowledge extraction , 2017, Expert Syst. Appl..

[8]  Shashi Narayan,et al.  Leveraging Pre-trained Checkpoints for Sequence Generation Tasks , 2019, Transactions of the Association for Computational Linguistics.

[9]  Lionel M. Ni,et al.  Generalizing from a Few Examples , 2020, ACM Comput. Surv..

[10]  Quanming Yao,et al.  Few-shot Learning: A Survey , 2019, ArXiv.

[11]  Yi Cai,et al.  A Two-phase Prototypical Network Model for Incremental Few-shot Relation Classification , 2020, COLING.

[12]  Yongbin Liu,et al.  Empirical study on character level neural network classifier for Chinese text , 2019, Eng. Appl. Artif. Intell..

[13]  Lluís F. Hurtado,et al.  Transformer based contextualization of pre-trained word embeddings for irony detection in Twitter , 2020, Inf. Process. Manag..

[14]  Yuxiang Xie,et al.  Heterogeneous graph neural networks for noisy few-shot relation classification , 2020, Knowl. Based Syst..

[15]  Zhiyuan Liu,et al.  Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification , 2019, AAAI.

[16]  Oriol Vinyals,et al.  Matching Networks for One Shot Learning , 2016, NIPS.

[17]  Chengjiang Li,et al.  XLORE2: Large-scale Cross-lingual Knowledge Graph Construction and Application , 2019, Data Intelligence.

[18]  Zhen-Hua Ling,et al.  Multi-Level Matching and Aggregation Network for Few-Shot Relation Classification , 2019, ACL.

[19]  Hai Ye,et al.  Deep Ranking Based Cost-sensitive Multi-label Learning for Distant Supervision Relation Extraction , 2019, Inf. Process. Manag..

[20]  Zhongfei Zhang,et al.  Improved prototypical networks for few-Shot learning , 2020, Pattern Recognit. Lett..

[21]  Brian C. Lovell,et al.  Unsupervised Domain Adaptation by Domain Invariant Projection , 2013, 2013 IEEE International Conference on Computer Vision.

[22]  Yoshua Bengio,et al.  Bayesian Model-Agnostic Meta-Learning , 2018, NeurIPS.

[23]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[24]  Hugo Larochelle,et al.  Optimization as a Model for Few-Shot Learning , 2016, ICLR.

[25]  Preslav Nakov,et al.  SemEval-2010 Task 8: Multi-Way Classification of Semantic Relations Between Pairs of Nominals , 2009, SEW@NAACL-HLT.

[26]  Heng Ji,et al.  Joint Entity and Event Extraction with Generative Adversarial Imitation Learning , 2019, Data Intelligence.

[27]  Markus Krötzsch,et al.  Wikidata , 2014 .

[28]  Jing Zhang,et al.  AMiner: Search and Mining of Academic Social Networks , 2019, Data Intelligence.

[29]  Daniel Jurafsky,et al.  Distant supervision for relation extraction without labeled data , 2009, ACL.

[30]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.