Multi-lingual Wikipedia Summarization and Title Generation On Low Resource Corpus

MultiLing 2019 Headline Generation Task on Wikipedia Corpus raised a critical and practical problem: multilingual task on low resource corpus. In this paper we proposed QDAS extractive summarization model enhanced by sentence2vec and try to apply transfer learning based on large multilingual pre-trained language model for Wikipedia Headline Generation task. We treat it as sequence labeling task and develop two schemes to handle with it. Experimental results have shown that large pre-trained model can effectively utilize learned knowledge to extract certain phrase using low resource supervised data.

[1]  Fabrizio Silvestri,et al.  HEADS: Headline Generation as Sequence Prediction Using an Abstract Feature-Rich Space , 2015, NAACL.

[2]  Konstantin Lopyrev,et al.  Generating News Headlines with Recurrent Neural Networks , 2015, ArXiv.

[3]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[4]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[7]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[8]  Xiaojun Wan,et al.  From Neural Sentence Summarization to Headline Generation: A Coarse-to-Fine Approach , 2017, IJCAI.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Naoaki Okazaki,et al.  Neural Headline Generation on Abstract Meaning Representation , 2016, EMNLP.

[11]  Enrique Alfonseca,et al.  Description of the UAM system for generating very short summaries at DUC-2004 ∗ , 2003 .

[12]  Sebastian Ruder,et al.  Universal Language Model Fine-tuning for Text Classification , 2018, ACL.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Yue Zhang,et al.  Event-Driven Headline Generation , 2015, ACL.

[15]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[16]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.