Deep Attentive Ranking Networks for Learning to Order Sentences

We present an attention-based ranking framework for learning to order sentences given a paragraph. Our framework is built on a bidirectional sentence encoder and a self-attention based transformer network to obtain an input order invariant representation of paragraphs. Moreover, it allows seamless training using a variety of ranking based loss functions, such as pointwise, pairwise, and listwise ranking. We apply our framework on two tasks: Sentence Ordering and Order Discrimination. Our framework outperforms various state-of-the-art methods on these tasks on a variety of evaluation metrics. We also show that it achieves better results when using pairwise and listwise ranking losses, rather than the pointwise ranking loss, which suggests that incorporating relative positions of two or more sentences in the loss function contributes to better learning.

[1]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[2]  Daniel Jurafsky,et al.  Neural Net Models of Open-domain Discourse Coherence , 2016, EMNLP.

[3]  Mirella Lapata,et al.  Concept-to-text Generation via Discriminative Reranking , 2012, ACL.

[4]  Camille Guinaudeau,et al.  Graph-based Local Coherence Modeling , 2013, ACL.

[5]  Regina Barzilay,et al.  Inferring Strategies for Sentence Ordering in Multidocument News Summarization , 2002, J. Artif. Intell. Res..

[6]  Mirella Lapata,et al.  Automatic Evaluation of Information Ordering: Kendall’s Tau , 2006, CL.

[7]  Kevin Duh,et al.  Learning to rank with partially-labeled data , 2008, SIGIR '08.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Sanja Fidler,et al.  Aligning Books and Movies: Towards Story-Like Visual Explanations by Watching Movies and Reading Books , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[11]  David McClure,et al.  Context is Key: New Approaches to Neural Coherence Modeling , 2018, ArXiv.

[12]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[13]  Ani Nenkova,et al.  A Coherence Model Based on Syntactic Patterns , 2012, EMNLP.

[14]  Nathanael Chambers,et al.  A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories , 2016, ArXiv.

[15]  Micha Elsner,et al.  Extending the Entity Grid with Entity-Specific Features , 2011, ACL.

[16]  Tie-Yan Liu,et al.  Listwise approach to learning to rank: theory and algorithm , 2008, ICML '08.

[17]  Mirella Lapata,et al.  Probabilistic Text Structuring: Experiments with Sentence Ordering , 2003, ACL.

[18]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[19]  Micha Elsner,et al.  Coreference-inspired Coherence Modeling , 2008, ACL.

[20]  Francis Ferraro,et al.  Visual Storytelling , 2016, NAACL.

[21]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Mirella Lapata,et al.  Modeling Local Coherence: An Entity-Based Approach , 2005, ACL.

[23]  Suzan Verberne,et al.  Retrieval-based Question Answering for Machine Reading Evaluation , 2011, CLEF.

[24]  Eduard H. Hovy,et al.  A Model of Coherence Based on Distributed Sentence Representation , 2014, EMNLP.

[25]  Tie-Yan Liu,et al.  Learning to rank: from pairwise approach to listwise approach , 2007, ICML '07.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Xuanjing Huang,et al.  End-to-End Neural Sentence Ordering Using Pointer Network , 2016, ArXiv.

[28]  Bowen Zhou,et al.  SummaRuNNer: A Recurrent Neural Network Based Sequence Model for Extractive Summarization of Documents , 2016, AAAI.

[29]  Honglak Lee,et al.  An efficient framework for learning sentence representations , 2018, ICLR.

[30]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[31]  Quoc V. Le,et al.  QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension , 2018, ICLR.

[32]  Kevin Gimpel,et al.  Bridging Nonlinearities and Stochastic Regularizers with Gaussian Error Linear Units , 2016, ArXiv.

[33]  Martin Monperrus,et al.  Learning to Combine Multiple Ranking Metrics for Fault Localization , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[34]  Xuanjing Huang,et al.  Neural Sentence Ordering , 2016, ArXiv.

[35]  Dhruv Batra,et al.  Sort Story: Sorting Jumbled Images and Captions into Stories , 2016, EMNLP.

[36]  Honglak Lee,et al.  Sentence Ordering and Coherence Modeling using Recurrent Neural Networks , 2016, AAAI.

[37]  Eda Marchetti,et al.  Introducing a Reasonably Complete and Coherent Approach for Model-based Testing , 2005, TACoS.

[38]  Xiaojun Wan,et al.  Hierarchical Attention Networks for Sentence Ordering , 2019, AAAI.

[39]  Zhongfei Zhang,et al.  Deep Attentive Sentence Ordering Network , 2018, EMNLP.

[40]  Graeme Hirst,et al.  Lexical Cohesion Computed by Thesaural relations as an indicator of the structure of text , 1991, CL.

[41]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[42]  Zhaohui Zheng,et al.  Learning to model relatedness for news recommendation , 2011, WWW.

[43]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[44]  Regina Barzilay,et al.  Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization , 2004, NAACL.