The FinSim-2 2021 Shared Task: Learning Semantic Similarities for the Financial Domain

The FinSim-2 is a second edition of FinSim Shared Task on Learning Semantic Similarities for the Financial Domain, colocated with the FinWeb workshop. FinSim-2 proposed the challenge to automatically learn effective and precise semantic models for the financial domain. The second edition of the FinSim offered an enriched dataset in terms of volume and quality, and interested in systems which make creative use of relevant resources such as ontologies and lexica, as well as systems which make use of contextual word embeddings such as BERT[4]. Going beyond the mere representation of words is a key step to industrial applications that make use of Natural Language Processing (NLP). This is typically addressed using either unsupervised corpus-derived representations like word embeddings, which are typically opaque to human understanding but very useful in NLP applications or manually created resources such as taxonomies and ontologies, which typically have low coverage and contain inconsistencies, but provide a deeper understanding of the target domain. Finsim is inspired from previous endeavours in the Semeval community, which organized several competitions on semantic/lexical relation extraction between concepts/words. This year, 18 system runs were submitted by 7 teams and systems were ranked according to 2 metrics, Accuracy and Mean rank. All the systems beat our baseline 1 model by over 15 points and the best systems beat the baseline 2 by over 1 ∼ 3 points in accuracy.

[1]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[2]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[3]  Paul Buitelaar,et al.  SemEval-2016 Task 13: Taxonomy Extraction Evaluation (TExEval-2) , 2016, *SEMEVAL.

[4]  Paul Buitelaar,et al.  SemEval-2015 Task 17: Taxonomy Extraction Evaluation (TExEval) , 2015, SemEval@NAACL-HLT.

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Dong Yu,et al.  SHIKEBLCU at SemEval-2020 Task 2: An External Knowledge-enhanced Matrix for Multilingual and Cross-Lingual Lexical Entailment , 2020, SemEval@COLING.

[7]  Goran Glavaš,et al.  SemEval-2020 Task 2: Predicting Multilingual and Cross-Lingual (Graded) Lexical Entailment , 2020, SEMEVAL.

[8]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[9]  Ismaïl El Maarouf,et al.  The FinSim 2020 Shared Task: Learning Semantic Representations for the Financial Domain , 2021, FINNLP.

[10]  Andras Kornai,et al.  BMEAUT at SemEval-2020 Task 2: Lexical Entailment with Semantic Graphs , 2020, SemEval@COLING.

[11]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[12]  Jun Zhao,et al.  FinBERT: A Pre-trained Financial Language Representation Model for Financial Text Mining , 2020, IJCAI.

[13]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[14]  Grzegorz Kondrak,et al.  UAlberta at SemEval-2020 Task 2: Using Translations to Predict Cross-Lingual Entailment , 2020, SemEval@COLING.