Text Segmentation by Cross Segment Attention

Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization. In this work, we propose three transformer-based architectures and provide comprehensive comparisons with previously proposed approaches on three standard datasets. We establish a new state-of-the-art, reducing in particular the error rates by a large margin in all cases. We further analyze model sizes and find that we can build models with many fewer parameters while keeping good performance, thus facilitating real-world applications.

[1]  Jing Li,et al.  SegBot: A Generic Neural Text Segmentation Model with Pointer Network , 2018, IJCAI.

[2]  Marti A. Hearst Text Tiling: Segmenting Text into Multi-paragraph Subtopic Passages , 1997, CL.

[3]  Freddy Y. Y. Choi Advances in domain independent linear text segmentation , 2000, ANLP.

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  Jonathan Berant,et al.  Text Segmentation as a Supervised Learning Task , 2018, NAACL.

[6]  Ion Androutsopoulos,et al.  Neural Legal Judgment Prediction in English , 2019, ACL.

[7]  Ann Bies,et al.  The Penn Treebank: Annotating Predicate Argument Structure , 1994, HLT.

[8]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[9]  Xiaodong Liu,et al.  SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization , 2020, ACL.

[10]  Yizhong Wang,et al.  Toward Fast and Accurate Neural Discourse Segmentation , 2018, EMNLP.

[11]  Vasudeva Varma,et al.  Attention-Based Neural Text Segmentation , 2018, ECIR.

[12]  Nikita Nikitinsky,et al.  Applying Topic Segmentation to Document-Level Information Retrieval , 2018, CEE-SECR '18.

[13]  Richard Zens,et al.  Content Explorer: Recommending Novel Entities for a Document Writer , 2018, EMNLP.

[14]  William C. Mann,et al.  Rhetorical Structure Theory: Toward a functional theory of text organization , 1988 .

[15]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[16]  Akira Shimazu,et al.  A Reranking Model for Discourse Segmentation using Subtree Features , 2012, SIGDIAL Conference.

[17]  Ming Zhou,et al.  HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization , 2019, ACL.

[18]  Daniel Marcu,et al.  Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory , 2001, SIGDIAL Workshop.

[19]  Hitoshi Isahara,et al.  A Statistical Model for Domain-Independent Text Segmentation , 2001, ACL.

[20]  Shafiq R. Joty,et al.  Discourse Analysis and Its Applications , 2019, ACL.

[21]  Kai Zou,et al.  EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks , 2019, EMNLP.

[22]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[23]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[24]  Marti A. Hearst,et al.  A Critique and Improvement of an Evaluation Metric for Text Segmentation , 2002, CL.

[25]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[28]  Frank Hutter,et al.  Fixing Weight Decay Regularization in Adam , 2017, ArXiv.

[29]  Sanjiv Kumar,et al.  Semantic Label Smoothing for Sequence to Sequence Problems , 2020, EMNLP.

[30]  Maxine Eskenazi,et al.  BeamSeg: A Joint Model for Multi-Document Segmentation and Topic Identification , 2019, CoNLL.

[31]  Fernando Llopis,et al.  Text Segmentation for Efficient Information Retrieval , 2002, CICLing.

[32]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[33]  Junyi Jessy Li,et al.  The Role of Discourse Units in Near-Extractive Summarization , 2016, SIGDIAL Conference.

[34]  Jacob Eisenstein,et al.  Hierarchical Text Segmentation from Multi-Scale Lexical Cohesion , 2009, NAACL.

[35]  Shafiq R. Joty,et al.  CODRA: A Novel Discriminative Framework for Rhetorical Analysis , 2015, CL.

[36]  John D. Lafferty,et al.  Statistical Models for Text Segmentation , 1999, Machine Learning.

[37]  Daniel Marcu,et al.  Sentence Level Discourse Parsing using Syntactic and Lexical Information , 2003, NAACL.