A Syntactic Approach to Domain-Specific Automatic Question Generation

Factoid questions are questions that require short fact-based answers. Automatic generation (AQG) of factoid questions from a given text can contribute to educational activities, interactive question answering systems, search engines, and other applications. The goal of our research is to generate factoid source-question-answer triplets based on a specific domain. We propose a four-component pipeline, which obtains as input a training corpus of domain-specific documents, along with a set of declarative sentences from the same domain, and generates as output a set of factoid questions that refer to the source sentences but are slightly different from them, so that a question-answering system or a person can be asked a question that requires a deeper understanding and knowledge than a simple word-matching. Contrary to existing domain-specific AQG systems that utilize the template-based approach to question generation, we propose to transform each source sentence into a set of questions by applying a series of domain-independent rules (a syntactic-based approach). Our pipeline was evaluated in the domain of cyber security using a series of experiments on each component of the pipeline separately and on the end-to-end system. The proposed approach generated a higher percentage of acceptable questions than a prior state-of-the-art AQG system.

[1]  Gregory Aist,et al.  Generating Questions Automatically from Informational Text , 2009 .

[2]  Yasemin Altun,et al.  Broad-Coverage Sense Disambiguation and Information Extraction with a Supersense Sequence Tagger , 2006, EMNLP.

[3]  Yoshua Bengio,et al.  Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus , 2016, ACL.

[4]  Ruslan Mitkov,et al.  Automatic generation of multiple choice questions using dependency-based semantic relations , 2014, Soft Comput..

[5]  Tomoko Kojiri,et al.  Automatic Question Generation for Educational Applications - The State of Art , 2014, ICCSAMA.

[6]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[7]  Delphine Bernhard,et al.  Question Generation for French: Collating Parsers and Paraphrasing Questions , 2012, Dialogue Discourse.

[8]  Sungroh Yoon,et al.  Training IBM Watson Using Automatically Generated Question-Answer Pairs , 2017, HICSS.

[9]  Noah A. Smith,et al.  Automatic factual question generation from text , 2011 .

[10]  Yi-Ting Huang,et al.  Generating Comprehension Questions Using Paraphrase , 2014, TAAI.

[11]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[12]  Alexander F. Gelbukh,et al.  Synonymous Paraphrasing Using WordNet and Internet , 2004, NLDB.

[13]  Xuchen Yao,et al.  Question Generation with Minimal Recursion Semantics , 2010 .

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Xian Zhang,et al.  Classifying What-Type Questions by Head Noun Tagging , 2008, COLING.

[16]  Albert Gatt,et al.  SimpleNLG: A Realisation Engine for Practical Applications , 2009, ENLG.

[17]  Hsin-Hsi Chen,et al.  Combining Word Embedding and Lexical Database for Semantic Relatedness Measurement , 2016, WWW.

[18]  Rodney D. Nielsen,et al.  Leveraging Multiple Views of Text for Automatic Question Generation , 2015, AIED.