Using BERT Encoding and Sentence-Level Language Model for Sentence Ordering

Discovering the logical sequence of events is one of the cornerstones in Natural Language Understanding. One approach to learn the sequence of events is to study the order of sentences in a coherent text. Sentence ordering can be applied in various tasks such as retrieval-based Question Answering, document summarization, storytelling, text generation, and dialogue systems. Furthermore, we can learn to model text coherence by learning how to order a set of shuffled sentences. Previous research has relied on RNN, LSTM, and BiLSTM architecture for learning text language models. However, these networks have performed poorly due to the lack of attention mechanisms. We propose an algorithm for sentence ordering in a corpus of short stories. Our proposed method uses a language model based on Universal Transformers (UT) that captures sentences’ dependencies by employing an attention mechanism. Our method improves the previous state-of-the-art in terms of Perfect Match Ratio (PMR) score in the ROCStories dataset, a corpus of nearly 100K short human-made stories. The proposed model includes three components: Sentence Encoder, Language Model, and Sentence Arrangement with Brute Force Search. The first component generates sentence embeddings using SBERT-WK pre-trained model fine-tuned on the ROCStories data. Then a Universal Transformer network generates a sentence-level language model. For decoding, the network generates a candidate sentence as the following sentence of the current sentence. We use cosine similarity as a scoring function to assign scores to the candidate embedding and the embeddings of other sentences in the shuffled set. Then a Brute Force Search is employed to maximize the sum of similarities between pairs of consecutive sentences.

[1]  S. Z. Razavi,et al.  A New Sentence Ordering Method using BERT Pretrained Model , 2020, 2020 11th International Conference on Information and Knowledge Technology (IKT).

[2]  Xiaojun Wan,et al.  Hierarchical Attention Networks for Sentence Ordering , 2019, AAAI.

[3]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[4]  Danushka Bollegala,et al.  A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization , 2006, ACL.

[5]  Stefan C. Kremer,et al.  Recurrent Neural Networks , 2013, Handbook on Neural Information Processing.

[6]  Po-Sen Huang,et al.  Discourse-Aware Neural Rewards for Coherent Text Generation , 2018, NAACL.

[7]  Dhruv Batra,et al.  Sort Story: Sorting Jumbled Images and Captions into Stories , 2016, EMNLP.

[8]  Honglak Lee,et al.  Sentence Ordering and Coherence Modeling using Recurrent Neural Networks , 2016, AAAI.

[9]  Nathanael Chambers,et al.  A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories , 2016, NAACL.

[10]  Raymond J. Mooney,et al.  Using Sentence-Level LSTM Language Models for Script Inference , 2016, ACL.

[11]  Lukás Burget,et al.  Empirical Evaluation and Combination of Advanced Language Modeling Techniques , 2011, INTERSPEECH.

[12]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[13]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Event Chains , 2008, ACL.

[14]  Daniel Povey,et al.  Large scale discriminative training of hidden Markov models for speech recognition , 2002, Comput. Speech Lang..

[15]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[16]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[17]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[18]  Sanja Fidler,et al.  Skip-Thought Vectors , 2015, NIPS.

[19]  Nathanael Chambers,et al.  Unsupervised Learning of Narrative Schemas and their Participants , 2009, ACL.

[20]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.

[21]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[22]  Nathanael Chambers,et al.  CaTeRS: Causal and Temporal Relation Scheme for Semantic Annotation of Event Structures , 2016, EVENTS@HLT-NAACL.

[23]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[24]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[25]  Harish Karnick,et al.  Deep Attentive Ranking Networks for Learning to Order Sentences , 2019, AAAI.

[26]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[27]  Hermann Ney,et al.  Open vocabulary handwriting recognition using combined word-level and character-level language models , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[29]  Xuanjing Huang,et al.  End-to-End Neural Sentence Ordering Using Pointer Network , 2016, ArXiv.

[30]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[31]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[32]  Micha Elsner,et al.  A Unified Local and Global Model for Discourse Coherence , 2007, NAACL.

[33]  Heshaam Faili,et al.  Handling OOV Words in NMT Using Unsupervised Bilingual Embedding , 2018, 2018 9th International Symposium on Telecommunications (IST).

[34]  Nathanael Chambers,et al.  LSDSem 2017 Shared Task: The Story Cloze Test , 2017, LSDSem@EACL.

[35]  Chin-Yew Lin,et al.  Automatic Evaluation of Machine Translation Quality Using Longest Common Subsequence and Skip-Bigram Statistics , 2004, ACL.

[36]  Daniel Jurafsky,et al.  Neural Net Models of Open-domain Discourse Coherence , 2016, EMNLP.

[37]  C.-C. Jay Kuo,et al.  SBERT-WK: A Sentence Embedding Method by Dissecting BERT-Based Word Models , 2020, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[40]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.