Sentence extraction with topic modeling for question–answer pair generation

Recently, automatic QA pair generation has been an essential technique to reduce human involvement in the construction of QA systems. In a big data era, huge information is produced every day. Therefore, it is an important issue for QA systems to be able to respond to users with up-to-date information, e.g., to answer questions regarding recent posts on blogs. The major problem in building such systems is the efficiency to capture relevant text sources for specific QA domains. In this study, topic modeling is used as a means to help determine efficiently if an article is of the same topic as a specific domain of interest, e.g., health domain as exemplified in this paper. QA pairs are then generated from these selected articles using the proposed sentence extraction method. Experimental results show that, using the proposed method with topic modeling, a 7.3 % acceptance rate improvement on the generated questions was achieved.

[1]  Keh-Jiann Chen,et al.  Automatic Semantic Role Assignment for a Tree Structure , 2004, SIGHAN@ACL.

[2]  Eric Steven Atwell,et al.  Different measurement metrics to evaluate a chatbot system , 2007, HLT-NAACL 2007.

[3]  Chung-Hsien Wu,et al.  Sentence Decomplexification using holistic aspect-based clause detection for long sentence understanding , 2010, 2010 7th International Symposium on Chinese Spoken Language Processing.

[4]  Keh-Jiann Chen,et al.  Chinese Treebanks and Grammar Extraction , 2004, IJCNLP.

[5]  Chung-Hsien Wu,et al.  Sentence Correction Incorporating Relative Position and Parse Template Language Models , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Daniel Jurafsky,et al.  Automatic Labeling of Semantic Roles , 2002, CL.

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Tianshun Yao,et al.  Applying Conditional Random Fields to Chinese Shallow Parsing , 2005, CICLing.

[9]  Chao Li,et al.  Automatically Generating Questions from Queries for Community-based Question Answering , 2011, IJCNLP.

[10]  Chu-Ren Huang,et al.  SINICA CORPUS : Design Methodology for Balanced Corpora , 1996, PACLIC.

[11]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[12]  Paul Piwek,et al.  The First Question Generation Shared Task Evaluation Challenge , 2010, Dialogue Discourse.

[13]  Dilek Z. Hakkani-Tür,et al.  Statistical Sentence Extraction for Information Distillation , 2007 .

[14]  Mitsuru Ishizuka,et al.  Fully Automated Generation of Question-Answer Pairs for Scripted Virtual Instruction , 2012, IVA.

[15]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[16]  Meng Wang,et al.  Chinese Semantic Role Labeling with Shallow Parsing , 2009, EMNLP.

[17]  Delphine Bernhard,et al.  Question Generation for French: Collating Parsers and Paraphrasing Questions , 2012, Dialogue Discourse.