Knowledge Enhanced Sports Game Summarization

Sports game summarization aims at generating sports news from live commentaries. However, existing datasets are all constructed through automated collection and cleaning processes, resulting in a lot of noise. Besides, current works neglect the knowledge gap between live commentaries and sports news, which limits the performance of sports game summarization. In this paper, we introduce K-SportsSum, a new dataset with two characteristics: (1) K-SportsSum collects a large amount of data from massive games. It has 7,854 commentary-news pairs. To improve the quality, KSportsSum employs a manual cleaning process; (2) Different from existing datasets, to narrow the knowledge gap, K-SportsSum further provides a large-scale knowledge corpus that contains the information of 523 sports teams and 14,724 sports players. Additionally, we also introduce a knowledge-enhanced summarizer that utilizes both live commentaries and the knowledge to generate sports news. Extensive experiments on K-SportsSum and SportsSum datasets show that our model achieves new state-ofthe-art performances. Qualitative analysis and human study further verify that our model generates more informative sports news.

[1]  Li Yang,et al.  ETC: Encoding Long and Structured Inputs in Transformers , 2020, EMNLP.

[2]  Gina-Anne Levow,et al.  The Third International Chinese Language Processing Bakeoff: Word Segmentation and Named Entity Recognition , 2006, SIGHAN@COLING/ACL.

[3]  Xiaojun Wan,et al.  Content Selection for Real-time Sports News Construction from Commentary Texts , 2017, INLG.

[4]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[5]  Colin Raffel,et al.  mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer , 2021, NAACL.

[6]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[7]  Omer Levy,et al.  BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension , 2019, ACL.

[8]  Alexander M. Rush,et al.  Abstractive Sentence Summarization with Attentive Recurrent Neural Networks , 2016, NAACL.

[9]  Han Ren,et al.  Sports News Generation from Live Webcast Scripts Based on Rules and Templates , 2016, NLPCC/ICCPOL.

[10]  Arman Cohan,et al.  Longformer: The Long-Document Transformer , 2020, ArXiv.

[11]  Jianshe Zhou,et al.  Research on Summary Sentences Extraction Oriented to Live Sports Text , 2016, NLPCC/ICCPOL.

[12]  Mirella Lapata,et al.  Sentence Centrality Revisited for Unsupervised Summarization , 2019, ACL.

[13]  Qiang Yang,et al.  SportsSum2.0: Generating High-Quality Sports News from Live Text Commentary , 2021, CIKM.

[14]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[15]  Xiaojun Wan,et al.  Overview of the NLPCC-ICCPOL 2016 Shared Task: Sports News Generation from Live Webcast Scripts , 2016, NLPCC/ICCPOL.

[16]  Ani Nenkova,et al.  Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , 2016, NAACL 2016.

[17]  Chen Li,et al.  Generating Sports News from Live Commentary: A Chinese Dataset for Sports Game Summarization , 2020, AACL.

[18]  Jason Weston,et al.  A Neural Attention Model for Abstractive Sentence Summarization , 2015, EMNLP.

[19]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[20]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[21]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[22]  Xipeng Qiu,et al.  FLAT: Chinese NER Using Flat-Lattice Transformer , 2020, ACL.

[23]  Jianshe Zhou,et al.  Generate Football News from Live Webcast Scripts Based on Character-CNN with Five Strokes , 2020 .

[24]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[25]  Xiaojun Wan,et al.  Towards Constructing Sports News from Live Text Commentary , 2016, ACL.

[26]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..