Design and Implementation of an Automatic Summarizer Using Extractive and Abstractive Methods

Communication via a natural language requires two fundamental skills: producing ‘text’ (written or spoken) and understanding it. Here, the two terms, natural language processing (NLP) and natural language generation (NLG) become important. NLU, also known as natural language understanding, is where the system understands to disambiguate the input sentences (in some human or natural language) to produce the machine representation language. NLG, on the other hand, is the technique of generating natural language from a machine representation system (a database/logical form). An example of a simple NLG system is the Pollen Forecast for Scotland system (Turner et al., in generating spatio-temporal descriptions in pollen forecasts 2006, [1]) that could essentially be a template. NLG system takes as input six numbers, which predicts the pollen levels in different parts of Scotland. From these numbers, a short textual summary of pollen levels is generated by the system as its output. In this paper, our aim is to test and/or improve the preexisting algorithms for our NLG system and if required, to develop our own algorithms for the said system. The paper starts, first defining the stages and components of the NLG task and their distinctive roles in accounting for the coherence and appropriateness of natural texts. It then sets out the principal methods that have been developed in the field for building working computational systems. Thereafter, the attempts to define a new method for the application being developed are shown. Finally, the problem faced in developing an NLG system and potential applications are discussed.

[1]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[2]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[3]  Ehud Reiter,et al.  Generating Spatio-Temporal Descriptions in Pollen Forecasts , 2006, EACL.

[4]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[5]  Christopher D. Manning,et al.  Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.

[6]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[7]  Xin Jiang,et al.  GPT-based Generation for Classical Chinese Poetry , 2019, ArXiv.

[8]  Xu Tan,et al.  MASS: Masked Sequence to Sequence Pre-training for Language Generation , 2019, ICML.

[9]  Emiel Krahmer,et al.  Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..

[10]  Richard H. R. Hahnloser,et al.  Data-driven Summarization of Scientific Articles , 2018, ArXiv.

[11]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[12]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[13]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[14]  Naren Ramakrishnan,et al.  Neural Abstractive Text Summarization with Sequence-to-Sequence Models , 2018, Trans. Data Sci..

[15]  Yoshua Bengio,et al.  End-to-end attention-based large vocabulary speech recognition , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).