Is your document novel? Let attention guide you. An attention-based model for document-level novelty detection

Detecting, whether a document contains sufficient new information to be deemed as novel, is of immense significance in this age of data duplication. Existing techniques for documentlevel novelty detection mostly perform at the lexical level and are unable to address the semantic-level redundancy. These techniques usually rely on handcrafted features extracted from the documents in a rule-based or traditional feature-based machine learning setup. Here, we present an effective approach based on neural attention mechanism to detect document-level novelty without any manual feature engineering. We contend that the simple alignment of texts between the source and target document(s) could identify the state of novelty of a target document. Our deep neural architecture elicits inference knowledge from a large-scale natural language inference dataset, which proves crucial to the novelty detection task. Our approach is effective and outperforms the standard baselines and recent work on document-level novelty detection by a margin of ∼3% in terms of accuracy.

[1]  Dipankar Dasgupta,et al.  Novelty detection in time series data using ideas from immunology , 1996 .

[2]  Flora S. Tsai,et al.  Evaluation of novelty metrics for sentence-level novelty mining , 2010, Inf. Sci..

[3]  Ian Soboroff,et al.  Overview of the TREC 2004 Novelty Track , 2004, TREC.

[4]  Donna K. Harman,et al.  Overview of the TREC 2003 Novelty Track , 2003, TREC.

[5]  Michalis Vazirgiannis,et al.  Efficient Online Novelty Detection in News Streams , 2013, WISE.

[6]  Peter Clark,et al.  The Seventh PASCAL Recognizing Textual Entailment Challenge , 2011, TAC.

[7]  Yonatan Bisk,et al.  Natural Language Inference from Multiple Premises , 2017, IJCNLP.

[8]  Lionel Tarassenko,et al.  The use of novelty detection techniques for monitoring high-integrity plant , 2002, Proceedings of the International Conference on Control Applications.

[9]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[10]  Symeon Papavassiliou,et al.  Network intrusion and fault detection: a statistical anomaly approach , 2002, IEEE Commun. Mag..

[11]  Thorsten Brants,et al.  A System for new event detection , 2003, SIGIR.

[12]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[13]  Ido Dagan,et al.  Recognizing Textual Entailment: Models and Applications , 2013, Recognizing Textual Entailment: Models and Applications.

[14]  Breck Baldwin,et al.  Cross-Document Event Coreference: Annotations, Experiments, and Observations , 1999, COREF@ACL.

[15]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[16]  Nan Hua,et al.  Universal Sentence Encoder for English , 2018, EMNLP.

[17]  Yi Zhang,et al.  Novelty and redundancy detection in adaptive filtering , 2002, SIGIR '02.

[18]  Flora S. Tsai,et al.  Sentence-Level Novelty Detection in English and Malay , 2009, PAKDD.

[19]  Joe Carthy,et al.  First Story Detection using a Composite Document Representation , 2001, HLT.

[20]  Le Zhao,et al.  Improved Feature Selection and Redundance Computing - THUIR at TREC 2004 Novelty Track , 2004, TREC.

[21]  Yiming Yang,et al.  A study of retrospective and on-line event detection , 1998, SIGIR '98.

[22]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[23]  Flora S. Tsai,et al.  Blended metrics for novel sentence mining , 2010, Expert Syst. Appl..

[24]  Sanjeev Arora,et al.  A Simple but Tough-to-Beat Baseline for Sentence Embeddings , 2017, ICLR.

[25]  Charles L. A. Clarke,et al.  Novelty and diversity in information retrieval evaluation , 2008, SIGIR '08.

[26]  James Allan,et al.  On-Line New Event Detection and Tracking , 1998, SIGIR.

[27]  Rui Yan,et al.  Recognizing Entailment and Contradiction by Tree-based Convolution , 2015, ArXiv.

[28]  James Allan,et al.  Retrieval and novelty detection at the sentence level , 2003, SIGIR.

[29]  Yiming Yang,et al.  Topic-conditioned novelty detection , 2002, KDD.

[30]  Lipika Dey,et al.  Automatic Scoring for Innovativeness of Textual Ideas , 2016, AAAI Workshop: Knowledge Extraction from Text.

[31]  Yi Zhang,et al.  Combining named entities and tags for novel sentence detection , 2009, ESAIR '09.

[32]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[33]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[34]  Dik Lun Lee,et al.  How Much Novelty is Relevant?: It Depends on Your Curiosity , 2016, SIGIR.

[35]  James Allan,et al.  First story detection in TDT is hard , 2000, CIKM '00.

[36]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[37]  Pushpak Bhattacharyya,et al.  Novelty Goes Deep. A Deep Neural Solution To Document Level Novelty Detection , 2018, COLING.

[38]  Michael Gamon Graph-Based Text Representation for Novelty Detection , 2006 .

[39]  Robert P. W. Duin,et al.  Outlier Detection Using Classifier Instability , 1998, SSPR/SPR.

[40]  Jade Goldstein-Stewart,et al.  The use of MMR, diversity-based reranking for reordering documents and producing summaries , 1998, SIGIR '98.

[41]  Yang Liu,et al.  Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[42]  Ben Carterette,et al.  Preference based evaluation measures for novelty and diversity , 2013, SIGIR.

[43]  Kathleen McKeown,et al.  Context and Learning in Novelty Detection , 2005, HLT.

[44]  Donna K. Harman,et al.  Novelty Detection: The TREC Experience , 2005, HLT.

[45]  Kevyn Collins-Thompson,et al.  Information Filtering, Novelty Detection, and Named-Page Finding , 2002, TREC.

[46]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[47]  Charles L. A. Clarke,et al.  A comparative analysis of cascade measures for novelty and diversity , 2011, WSDM '11.

[48]  W. Bruce Croft,et al.  Novelty detection based on sentence level patterns , 2005, CIKM '05.

[49]  Gail A. Carpenter,et al.  ARTMAP-FD: familiarity discrimination applied to radar target recognition , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).

[50]  Vetle I. Torvik,et al.  Quantifying Conceptual Novelty in the Biomedical Literature , 2016, D Lib Mag..

[51]  Pushpak Bhattacharyya,et al.  TAP-DLND 1.0 : A Corpus for Document Level Novelty Detection , 2018, LREC.

[52]  Praveen Bysani,et al.  Detecting Novelty in the context of Progressive Summarization , 2010, NAACL.

[53]  James Allan,et al.  Topic Models for Summarizing Novelty , 2001 .

[54]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[55]  J. D. T. Tannock,et al.  On-line control chart pattern detection and discrimination - a neural network approach , 1999, Artif. Intell. Eng..

[56]  Susan T. Dumais,et al.  Newsjunkie: providing personalized newsfeeds via analysis of information novelty , 2004, WWW '04.

[57]  Yi Zhang,et al.  D2S: Document-to-sentence framework for novelty detection , 2011, Knowledge and Information Systems.

[58]  Flavius Frasincar,et al.  A Comparison Study for Novelty Control Mechanisms Applied to Web News Stories , 2012, 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology.

[59]  Donna K. Harman,et al.  Overview of the TREC 2002 Novelty Track , 2002, TREC.

[60]  Sungjin Lee,et al.  Online Sentence Novelty Scoring for Topical Document Streams , 2015, EMNLP.