YNUNLP at SemEval-2023 Task 2: The Pseudo Twin Tower Pre-training Model for Chinese Named Entity Recognition

This paper introduces our method in the system for SemEval 2023 Task 2: MultiCoNER II Multilingual Complex Named Entity Recognition, Track 9-Chinese. This task focuses on detecting fine-grained named entities whose data set has a fine-grained taxonomy of 36 NE classes, representing a realistic challenge for NER. In this task, we need to identify entity boundaries and category labels for the six identified categories. We use BERT embedding to represent each character in the original sentence and train CRF-Rdrop to predict named entity categories using the data set provided by the organizer. Our best submission, with a macro average F1 score of 0.5657, ranked 15th out of 22 teams.

[1]  S. Malmasi,et al.  SemEval-2023 Task 2: Fine-grained Multilingual Named Entity Recognition (MultiCoNER 2) , 2023, SEMEVAL.

[2]  S. Malmasi,et al.  MultiCoNER: A Large-scale Multilingual Dataset for Complex Named Entity Recognition , 2022, COLING.

[3]  Fei Huang,et al.  DAMO-NLP at SemEval-2022 Task 11: A Knowledge-based System for Multilingual Named Entity Recognition , 2022, SEMEVAL.

[4]  Shervin Malmasi,et al.  GEMNET: Effective Gated Gazetteer Representations for Recognizing Complex Entities in Low-context Input , 2021, NAACL.

[5]  Alexander Sboev,et al.  An analysis of full-size Russian complexly NER labelled corpus of Internet user reviews on the drugs based on deep learning and language neural nets , 2021, ArXiv.

[6]  Shu-Ping Lin,et al.  A comprehensive quality improvement model: integrating internal and external information , 2021, Total Quality Management & Business Excellence.

[7]  Nick Craswell,et al.  ORCAS: 18 Million Clicked Query-Document Pairs for Analyzing Search , 2020, CIKM.

[8]  Xuanjing Huang,et al.  CNN-Based Chinese NER with Lexicon Rethinking , 2019, IJCAI.

[9]  Chenliang Li,et al.  A Survey on Deep Learning for Named Entity Recognition , 2018, IEEE Transactions on Knowledge and Data Engineering.

[10]  Min Zhang,et al.  Distantly Supervised NER with Partial Annotation Learning and Reinforcement Learning , 2018, COLING.

[11]  Fei Wang,et al.  Word Network Topic Model Based on Word2Vector , 2018, 2018 IEEE Fourth International Conference on Big Data Computing Service and Applications (BigDataService).

[12]  Jianfeng Gao,et al.  MS MARCO: A Human Generated MAchine Reading COmprehension Dataset , 2016, CoCo@NIPS.

[13]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[14]  Michael Collins,et al.  Learning Dictionaries for Named Entity Recognition using Minimal Supervision , 2014, EACL.

[15]  Andrés Montoyo,et al.  Advances on natural language processing , 2007, Data Knowl. Eng..

[16]  Shiaoching Gong,et al.  A gene expression atlas of the central nervous system based on bacterial artificial chromosomes , 2003, Nature.

[17]  James R. Curran,et al.  Language Independent NER using a Maximum Entropy Tagger , 2003, CoNLL.

[18]  S. Malmasi,et al.  SemEval-2022 Task 11: Multilingual Complex Named Entity Recognition (MultiCoNER) , 2022, SEMEVAL.

[19]  Yanru Zhang,et al.  Yet@SMM4H’22: Improved BERT-based classification models with Rdrop and PolyLoss , 2022, SMM4H.

[20]  Xiaobing Zhou,et al.  Clinical Text Entity Recognition Based on Pretrained Model and BiGRU-CRF , 2022, IberLEF@SEPLN.

[21]  Zhengyi Guan,et al.  Yunnan-Deep at eHealth-KD Challenge 2021: Deep Learning Model for Entity Recognition in Spanish Documents , 2021, IberLEF@SEPLN.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[24]  Yorick Wilks,et al.  University of Sheffield: Description of the LaSIE System as Used for MUC-6 , 1995, MUC.