论文信息 - Ranked List Fusion and Re-ranking with Pre-trained Transformers for ARQMath Lab

Ranked List Fusion and Re-ranking with Pre-trained Transformers for ARQMath Lab

This paper elaborates on our submission to the ARQMath track at CLEF 2021. For our submission this year we use a collection of methods to retrieve and re-rank the answers in Math Stack Exchange in addition to our two-stage model which was comparable to the best model last year in terms of NDCG’. We also provide a detailed analysis of what the transformers are learning and why is it hard to train a math language model using transformers. This year’s submission to Task-1 includes summarizing long question-answer pairs to augment and index documents, using byte-pair encoding to tokenize formula and then re-rank them, and finally important keywords extraction from posts. Using an ensemble of these methods our approach shows a 20% improvement than our ARQMath’2020 Task-1 submission.

C. Lee Giles | Jian Wu | Shaurya Rohatgi | Jian Wu | Shaurya Rohatgi

[1] André Greiner-Petter,et al. ARQMath Lab: An Incubator for Semantic Formula Search in zbMATH Open? , 2020, CLEF.

[2] George Labahn,et al. Dowsing for Math Answers with Tangent-L , 2020, CLEF.

[3] Iadh Ounis,et al. NTCIR-12 MathIR Task Overview , 2016, NTCIR.

[4] Douglas W. Oard,et al. Tangent-CFT: An Embedding Model for Mathematical Formulas , 2019, ICTIR.

[5] Charles L. A. Clarke,et al. Reciprocal rank fusion outperforms condorcet and individual rank learning methods , 2009, SIGIR.

[6] Zhi Tang,et al. MathBERT: A Pre-Trained Model for Mathematical Formula Understanding , 2021, ArXiv.

[7] Rico Sennrich,et al. Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[8] Douglas W. Oard,et al. Advancing Math-Aware Search: The ARQMath-2 Lab at CLEF 2021 , 2021, ECIR.

[9] Jian Wu,et al. PSU at CLEF-2020 ARQMath Track: Unsupervised Re-ranking using Pretraining , 2020, CLEF.

[10] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.