SpanPredict: Extraction of Predictive Document Spans with Neural Attention

In many natural language processing applications, identifying predictive text can be as important as the predictions themselves. When predicting medical diagnoses, for example, identifying predictive content in clinical notes not only enhances interpretability, but also allows unknown, descriptive (i.e., text-based) risk factors to be identified. We here formalize this problem as predictive extraction and address it using a simple mechanism based on linear attention. Our method preserves differentiability, allowing scalable inference via stochastic gradient descent. Further, the model decomposes predictions into a sum of contributions of distinct text spans. Importantly, we require only document labels, not ground-truth spans. Results show that our model identifies semantically-cohesive spans and assigns them scores that agree with human ratings, while preserving classification performance.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Jennifer J. Liang,et al.  A Novel System for Extractive Clinical Note Summarization using EHR Data , 2019, Proceedings of the 2nd Clinical Natural Language Processing Workshop.

[3]  Jascha Sohl-Dickstein,et al.  REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models , 2017, NIPS.

[4]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[5]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[6]  Pengtao Xie,et al.  Unsupervised Pseudo-Labeling for Extractive Summarization on Electronic Health Records , 2018, ArXiv.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Percy Liang,et al.  Know What You Don’t Know: Unanswerable Questions for SQuAD , 2018, ACL.

[9]  Anne Kim,et al.  Extractive Summarization of EHR Discharge Notes , 2018, ArXiv.

[10]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[11]  Regina Barzilay,et al.  Rationalizing Neural Predictions , 2016, EMNLP.

[12]  Yong Zhang,et al.  Multiview Convolutional Neural Networks for Multidocument Extractive Summarization , 2017, IEEE Transactions on Cybernetics.

[13]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[14]  Mirella Lapata,et al.  Neural Summarization by Extracting Sentences and Words , 2016, ACL.

[15]  Finale Doshi-Velez,et al.  Prediction Focused Topic Models via Vocab Selection , 2019, ArXiv.

[16]  Mirella Lapata,et al.  Ranking Sentences for Extractive Summarization with Reinforcement Learning , 2018, NAACL.

[17]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[18]  Yonatan Belinkov,et al.  Analysis Methods in Neural Language Processing: A Survey , 2018, TACL.

[19]  Kenton Lee,et al.  Learning Recurrent Span Representations for Extractive Question Answering , 2016, ArXiv.

[20]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Jure Leskovec,et al.  From amateurs to connoisseurs: modeling the evolution of user expertise through online reviews , 2013, WWW.

[22]  Ivan Titov,et al.  Interpretable Neural Predictions with Differentiable Binary Variables , 2019, ACL.

[23]  Finale Doshi-Velez,et al.  Prediction-Constrained Training for Semi-Supervised Mixture and Topic Models , 2017, ArXiv.

[24]  Yue Zhang,et al.  Sentence-State LSTM for Text Representation , 2018, ACL.