论文信息 - Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching

Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching

Semantic text matching models have been widely used in community question answering, information retrieval, and dialogue. However, these models cannot well address the long-form text matching problem. That is because there are usually many noises in the setting of long-form text matching, and it is difficult for existing semantic text matching to capture the key matching signals from this noisy information. Besides, these models are computationally expensive because they simply use all textual data indiscriminately in the matching process. To tackle the effectiveness and efficiency problem, we propose a novel hierarchical noise filtering model in this paper, namely Match-Ignition. The basic idea is to plug the wellknown PageRank algorithm into the Transformer, to identify and filter both sentence and word level noisy information in the matching process. Noisy sentences are usually easy to detect because the sentence is the basic unit of a long-form text, so we directly use PageRank to filter such information, based on a sentence similarity graph. While words need to rely on their contexts to express concrete meanings, so we propose to jointly learn the filtering process and the matching process, to reflect the contextual dependencies between words. Specifically, a word graph is first built based on the attention scores in each self-attention block of Transformer, and keywords are then selected by applying PageRank on this graph. In this way, noisy words will be filtered out layer by layer in the matching process. Experimental results show that Match-Ignition outperforms both traditional text matching models for short text and recent long-form text matching models. We also conduct detailed analysis to show that Match-Ignition can efficiently capture important sentences or words, which are helpful for long-form text matching. ACM Reference Format: Liang Pang, Yanyan Lan, Xueqi Cheng. 2021. Match-Ignition: Plugging PageRank into Transformer for Long-form Text Matching. In Proceedings of ACM Conference (Conference’17). ACM, New York, NY, USA, 9 pages.

Liang Pang | Yanyan Lan | Xueqi Cheng

[1] Gerard de Melo,et al. PACRR: A Position-Aware Neural IR Model for Relevance Matching , 2017, EMNLP.

[2] Xueqi Cheng,et al. Match-SRNN: Modeling the Recursive Matching Structure with Spatial RNN , 2016, IJCAI.

[3] Cheng Li,et al. Semantic Text Matching for Long-Form Documents , 2019, WWW.

[4] Xueqi Cheng,et al. Text Matching as Image Recognition , 2016, AAAI.

[5] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[6] Xueqi Cheng,et al. DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval , 2017, CIKM.

[7] Yiming Yang,et al. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context , 2019, ACL.

[8] Liu Yang,et al. Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Document Matching , 2020, ArXiv.

[9] Xueqi Cheng,et al. A Study of MatchPyramid Models on Ad-hoc Retrieval , 2016, ArXiv.

[10] Yu Xu,et al. Matching Article Pairs with Graphical Decomposition and Convolutions , 2018, ACL.

[11] Zheng Zhang,et al. Star-Transformer , 2019, NAACL.