Text summarization using unsupervised deep learning

Unsupervised extractive summarization of emails using a deep auto-encoder with excellent performance.Ensemble Noisy Auto-Encoder runs noisy inputs through one trained network, enhancing performance.Summaries are highly informative and semantically similar to human abstracts. We present methods of extractive query-oriented single-document summarization using a deep auto-encoder (AE) to compute a feature space from the term-frequency (tf) input. Our experiments explore both local and global vocabularies. We investigate the effect of adding small random noise to local tf as the input representation of AE, and propose an ensemble of such noisy AEs which we call the Ensemble Noisy Auto-Encoder (ENAE). ENAE is a stochastic version of an AE that adds noise to the input text and selects the top sentences from an ensemble of noisy runs. In each individual experiment of the ensemble, a different randomly generated noise is added to the input representation. This architecture changes the application of the AE from a deterministic feed-forward network to a stochastic runtime model. Experiments show that the AE using local vocabularies clearly provide a more discriminative feature space and improves the recall on average 11.2%. The ENAE can make further improvements, particularly in selecting informative sentences. To cover a wide range of topics and structures, we perform experiments on two different publicly available email corpora that are specifically designed for text summarization. We used ROUGE as a fully automatic metric in text summarization and we presented the average ROUGE-2 recall for all experiments.

[1]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[2]  Geoffrey E. Hinton,et al.  Discovering Binary Codes for Documents by Learning Deep Generative Models , 2011, Top. Cogn. Sci..

[3]  Ming Zhou,et al.  Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization , 2015, AAAI.

[4]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[5]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[6]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Geoffrey E. Hinton Training Products of Experts by Minimizing Contrastive Divergence , 2002, Neural Computation.

[8]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[9]  Hans Peter Luhn,et al.  The Automatic Creation of Literature Abstracts , 1958, IBM J. Res. Dev..

[10]  Pascal Vincent,et al.  Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion , 2010, J. Mach. Learn. Res..

[11]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[12]  Jing Long,et al.  Query-oriented unsupervised multi-document summarization via deep learning model , 2015, Expert Syst. Appl..

[13]  Michael Gamon,et al.  Task-Focused Summarization of Email , 2004 .

[14]  M. Sanderson Book Reviews: Advances in Automatic Text Summarization , 2000, Computational Linguistics.

[15]  Ani Nenkova,et al.  A Survey of Text Summarization Techniques , 2012, Mining Text Data.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[18]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[19]  Miguel Á. Carreira-Perpiñán,et al.  On Contrastive Divergence Learning , 2005, AISTATS.

[20]  Giuseppe Carenini,et al.  Regression-Based Summarization of Email Conversations , 2009, ICWSM.

[21]  Dianne P. O'Leary,et al.  Text summarization via hidden Markov models , 2001, SIGIR '01.

[22]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[23]  Richard S. Zemel,et al.  Minimizing Description Length in an Unsupervised Neural Network , 2000 .

[24]  Nicolas Le Roux,et al.  Representational Power of Restricted Boltzmann Machines and Deep Belief Networks , 2008, Neural Computation.

[25]  Jun'ichi Tsujii,et al.  Multi-topical Discussion Summarization Using Structured Lexical Chains and Cue Words , 2011, CICLing.

[26]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[27]  Misha Denil,et al.  Extraction of Salient Sentences from Labelled Documents , 2014, ArXiv.

[28]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[29]  Samy Bengio,et al.  Links between perceptrons, MLPs and SVMs , 2004, ICML.

[30]  Len Hamey,et al.  Query-Based Single Document Summarization Using an Ensemble Noisy Auto-Encoder , 2015, ALTA.

[31]  Quoc V. Le,et al.  On optimization methods for deep learning , 2011, ICML.

[32]  Elena Lloret,et al.  Text summarisation in progress: a literature review , 2011, Artificial Intelligence Review.

[33]  Yoshua Bengio,et al.  Large-Scale Learning of Embeddings with Reconstruction Sampling , 2011, ICML.

[34]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[35]  Giorgio Valentini,et al.  Ensemble methods : a review , 2012 .

[36]  Rich Caruana,et al.  Model compression , 2006, KDD '06.

[37]  Hoa Trang Dang,et al.  Overview of the TAC 2008 Update Summarization Task , 2008, TAC.

[38]  Pascal Vincent,et al.  Contractive Auto-Encoders: Explicit Invariance During Feature Extraction , 2011, ICML.

[39]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[40]  Pascal Vincent,et al.  The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training , 2009, AISTATS.

[41]  Rajesh S. Prasad,et al.  Implementation and Evaluation of Evolutionary Connectionist Approaches to Automated Text Summarization , 2010 .

[42]  Qin Lu,et al.  Applying regression models to query-focused multi-document summarization , 2011, Inf. Process. Manag..

[43]  Khosrow Kaikhah Text Summarization Using Neural Networks , 2004 .

[44]  Ani Nenkova,et al.  Automatically Assessing Machine Summary Content Without a Gold Standard , 2013, CL.

[45]  Lucy Vanderwende,et al.  Enhancing Single-Document Summarization by Combining RankNet and Third-Party Sources , 2007, EMNLP.

[46]  Geoffrey E. Hinton A Practical Guide to Training Restricted Boltzmann Machines , 2012, Neural Networks: Tricks of the Trade.

[47]  U. V. Kulkarni,et al.  Connectionist Approach to Generic Text Summarization , 2009 .

[48]  Yong Yu,et al.  Enhancing diversity, coverage and balance for summarization through structure learning , 2009, WWW '09.

[49]  Ani Nenkova,et al.  The Pyramid Method: Incorporating human content selection variation in summarization evaluation , 2007, TSLP.

[50]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[51]  Kam-Fai Wong,et al.  Interpreting TF-IDF term weights as making relevance decisions , 2008, TOIS.

[52]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[53]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[54]  Jung-Woo Ha,et al.  News2Images: Automatically Summarizing News Articles into Image-Based Contents via Deep Learning , 2015, INRA@RecSys.

[55]  Yang Song,et al.  Topical Keyphrase Extraction from Twitter , 2011, ACL.

[56]  G. Carenini,et al.  A Publicly Available Annotated Corpus for Supervised Email Summarization , 2008 .

[57]  H. P. Edmundson,et al.  New Methods in Automatic Extracting , 1969, JACM.

[58]  Shibamouli Lahiri,et al.  Building a Dataset for Summarization and Keyword Extraction from Emails , 2014, LREC.

[59]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[60]  Yoshua Bengio,et al.  Deep Learning for Automatic Summary Scoring , 2012 .

[61]  Dragomir R. Radev,et al.  Citation Summarization Through Keyphrase Extraction , 2010, COLING.