论文信息 - Semantic Matching Against a Corpus: New Applications and Methods - 字舞流文

Semantic Matching Against a Corpus: New Applications and Methods

We consider the case of a domain expert who wishes to explore the extent to which a particular idea is expressed in a text collection. We propose the task of semantically matching the idea, expressed as a natural language proposition, against a corpus. We create two preliminary tasks derived from existing datasets, and then introduce a more realistic one on disaster recovery designed for emergency managers, whom we engaged in a user study. On the latter, we find that a new model built from natural language entailment data produces higher-quality matches than simple word-vector averaging, both on expert-crafted queries and on ones produced by the subjects themselves. This work provides a proof-of-concept for such applications of semantic matching and illustrates key challenges.

Noah A. Smith | Scott B. Miles | Lucy H. Lin | S. Miles

[1] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[2] Kevin Gimpel,et al. Towards Universal Paraphrastic Sentence Embeddings , 2015, ICLR.

[3] Peter Clark,et al. SciTaiL: A Textual Entailment Dataset from Science Question Answering , 2018, AAAI.

[4] Mihai Surdeanu,et al. The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[5] John D. Lafferty,et al. Dynamic topic models , 2006, ICML.

[6] Chris Callison-Burch,et al. PPDB: The Paraphrase Database , 2013, NAACL.

[7] Chris Brockett,et al. Automatically Constructing a Corpus of Sentential Paraphrases , 2005, IJCNLP.

[8] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[9] Noah A. Smith,et al. The Media Frames Corpus: Annotations of Frames Across Issues , 2015, ACL.

[10] Noah A. Smith,et al. Natural Language Processing for Analyzing Disaster Recovery Trends Expressed in Large Text Corpora , 2018, 2018 IEEE Global Humanitarian Technology Conference (GHTC).

[11] Jakob Uszkoreit,et al. A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[12] Daniel Jurafsky,et al. Predicting the Rise and Fall of Scientific Topics from Trends in their Rhetorical Framing , 2016, ACL.

[13] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14] Noah A. Smith,et al. Tree Edit Models for Recognizing Textual Entailments, Paraphrases, and Answers to Questions , 2010, NAACL.

[15] Marco Marelli,et al. A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[16] Zhen-Hua Ling,et al. Enhanced LSTM for Natural Language Inference , 2016, ACL.

[17] Hal Daumé,et al. Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.

[18] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[19] Jimmy J. Lin,et al. Quantitative evaluation of passage retrieval algorithms for question answering , 2003, SIGIR.

[20] Noah A. Smith,et al. Friendships, Rivalries, and Trysts: Characterizing Relations between Ideas in Texts , 2017, ACL.

[21] Ido Dagan,et al. The Sixth PASCAL Recognizing Textual Entailment Challenge , 2009, TAC.

[22] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[23] Ido Dagan,et al. The Third PASCAL Recognizing Textual Entailment Challenge , 2007, ACL-PASCAL@ACL.

[24] Eneko Agirre,et al. SemEval-2012 Task 6: A Pilot on Semantic Textual Similarity , 2012, *SEMEVAL.

[25] Chris Quirk,et al. Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources , 2004, COLING.

[26] Phil Blunsom,et al. Teaching Machines to Read and Comprehend , 2015, NIPS.

[27] Noah A. Smith,et al. Tracking the Development of Media Frames within and across Policy Issues , 2014 .

[28] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .

[29] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[30] Jure Leskovec,et al. Meme-tracking and the dynamics of the news cycle , 2009, KDD.

[31] Björn-Olav Dozo,et al. Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[32] Danqi Chen,et al. A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task , 2016, ACL.

[33] James Allan,et al. A comparison of sentence retrieval techniques , 2007, SIGIR.

[34] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.