Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus

Over the past decade, large-scale supervised learning corpora have enabled machine learning researchers to make substantial advances. However, to this date, there are no large-scale question-answer corpora available. In this paper we present the 30M Factoid Question-Answer Corpus, an enormous question answer pair corpus produced by applying a novel neural network architecture on the knowledge base Freebase to transduce facts into natural language questions. The produced question answer pairs are evaluated both by human evaluators and using automatic evaluation metrics, including well-established machine translation and sentence similarity metrics. Across all evaluation criteria the question-generation model outperforms the competing template-based baseline. Furthermore, when presented to human evaluators, the generated questions appear comparable in quality to real human-generated questions.

[1]  Jason Weston,et al.  Large-scale Simple Question Answering with Memory Networks , 2015, ArXiv.

[2]  Yoshua Bengio,et al.  Practical Recommendations for Gradient-Based Training of Deep Architectures , 2012, Neural Networks: Tricks of the Trade.

[3]  Luísa Coheur,et al.  Question Generation based on Lexico-Syntactic Patterns Learned from the Web , 2012, Dialogue Discourse.

[4]  Gregory Aist,et al.  Generating Questions Automatically from Informational Text , 2009 .

[5]  Arthur C. Graesser,et al.  Question Generation from Concept Maps , 2012, Dialogue Discourse.

[6]  Quoc V. Le,et al.  Addressing the Rare Word Problem in Neural Machine Translation , 2014, ACL.

[7]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[8]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Jason Weston,et al.  Open Question Answering with Weakly Supervised Embedding Models , 2014, ECML/PKDD.

[10]  Jason Weston,et al.  Translating Embeddings for Modeling Multi-relational Data , 2013, NIPS.

[11]  Ewan Klein,et al.  Generating Natural Language from Linked Data , 2013 .

[12]  Vasile Rus,et al.  A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics , 2012, BEA@NAACL-HLT.

[13]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[14]  Ellen M. Voorhees,et al.  Overview of the TREC-9 Question Answering Track , 2000, TREC.

[15]  Daniel Duma,et al.  Generating Natural Language from Linked Data: Unsupervised template extraction , 2013, IWCS.

[16]  Jason Weston,et al.  Learning Structured Embeddings of Knowledge Bases , 2011, AAAI.

[17]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[18]  Yi Zhang,et al.  Semantics-based Question Generation and Implementation , 2012, Dialogue Discourse.

[19]  Markus Krötzsch,et al.  Wikidata: a free collaborative knowledgebase , 2014, Commun. ACM.

[20]  Xuchen Yao,et al.  Question Generation with Minimal Recursion Semantics , 2010 .

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[23]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[24]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[25]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[26]  Paul Piwek,et al.  The First Question Generation Shared Task Evaluation Challenge , 2010, Dialogue Discourse.

[27]  Sadid A. Hasan,et al.  Automation of Question Generation From Sentences , 2011 .

[28]  Praveen Paritosh,et al.  Freebase: a collaboratively created graph database for structuring human knowledge , 2008, SIGMOD Conference.

[29]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[30]  A. Graesser,et al.  QUEST: A model of question answering☆ , 1992 .

[31]  Rashmi Prasad,et al.  Question Generation from Paragraphs at UPenn: QGSTEC System Description , 2010 .

[32]  Enrico Motta,et al.  Is Question Answering fit for the Semantic Web?: A survey , 2011, Semantic Web.

[33]  Jimmy J. Lin,et al.  Web question answering: is more always better? , 2002, SIGIR '02.

[34]  Xinlei Chen,et al.  Microsoft COCO Captions: Data Collection and Evaluation Server , 2015, ArXiv.