Socratic Question Generation: A Novel Dataset, Models, and Evaluation

Socratic questioning is a form of reflective inquiry often employed in education to encourage critical thinking in students, and to elicit awareness of beliefs and perspectives in a subject during therapeutic counseling. Specific types of Socratic questions are employed for enabling reasoning and alternate views against the context of individual personal opinions on a topic. Socratic contexts are different from traditional question generation contexts where “answer-seeking” questions are generated against a given formal passage on a topic, narrative stories or conversations.We present SocratiQ, the first large dataset of 110K (question, context) pairs for enabling studies on Socratic Question Generation (SoQG). We provide an in-depth study on the various types of Socratic questions and present models for generating Socratic questions against a given context through prompt tuning. Our automated and human evaluation results demonstrate that our SoQG models can produce realistic, type-sensitive, human-like Socratic questions enabling potential applications in counseling and coaching.

[1]  Ming Zhou,et al.  A Survey of Controllable Text Generation Using Transformer-based Pre-trained Language Models , 2022, ACM Comput. Surv..

[2]  D. Strunk,et al.  Using Socratic Questioning to promote cognitive change and achieve depressive symptom reduction: Evidence of cognitive change as a mediator. , 2022, Behaviour research and therapy.

[3]  Weizhe Yuan,et al.  BARTScore: Evaluating Generated Text as Text Generation , 2021, NeurIPS.

[4]  Mark Hasegawa-Johnson,et al.  Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering , 2021, NAACL.

[5]  D. Strunk,et al.  Cognitive Bias and Medication Use Moderate the Relation of Socratic Questioning and Symptom Change in Cognitive Behavioral Therapy of Depression , 2021, Cognitive Therapy and Research.

[6]  Brian Lester,et al.  The Power of Scale for Parameter-Efficient Prompt Tuning , 2021, EMNLP.

[7]  L. Azzopardi Cognitive Biases in Search: A Review and Reflection of Cognitive Biases in Information Retrieval , 2021, CHIIR.

[8]  Bill Byrne,et al.  TicketTalk: Toward human-level performance with end-to-end, transaction-based dialog systems , 2020, ACL.

[9]  T. Zhao,et al.  Fine-Tuning Pre-trained Language Model with Weak Supervision: A Contrastive-Regularized Self-Training Approach , 2020, NAACL.

[10]  Vishrav Chaudhary,et al.  Self-training Improves Pre-training for Natural Language Understanding , 2020, NAACL.

[11]  Junyi Jessy Li,et al.  Inquisitive Question Generation for High Level Text Comprehension , 2020, EMNLP.

[12]  Roberto Basili,et al.  GAN-BERT: Generative Adversarial Learning for Robust Text Classification with a Bunch of Labeled Examples , 2020, ACL.

[13]  Subhabrata Mukherjee,et al.  Uncertainty-aware Self-training for Few-shot Text Classification , 2020, NeurIPS.

[14]  Thibault Sellam,et al.  BLEURT: Learning Robust Metrics for Text Generation , 2020, ACL.

[15]  Yue Zhang,et al.  MuTual: A Dataset for Multi-Turn Dialogue Reasoning , 2020, ACL.

[16]  Isabelle van der Vegt,et al.  Measuring Emotions in the COVID-19 Real World Worry Dataset , 2020, NLPCOVID19.

[17]  Jeremy Blackburn,et al.  The Pushshift Reddit Dataset , 2020, ICWSM.

[18]  Ming Zhou,et al.  ProphetNet: Predicting Future N-gram for Sequence-to-Sequence Pre-training , 2020, FINDINGS.

[19]  Colin Raffel,et al.  Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , 2019, J. Mach. Learn. Res..

[20]  Lysandre Debut,et al.  HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.

[21]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[22]  Benjamin Piwowarski,et al.  Self-Attention Architectures for Answer-Agnostic Neural Question Generation , 2019, ACL.

[23]  Nian-Shing Chen,et al.  Using Socratic Questioning Strategy to Enhance Critical Thinking Skill of Elementary School Students , 2019, 2019 IEEE 19th International Conference on Advanced Learning Technologies (ICALT).

[24]  Tat-Seng Chua,et al.  Recent Advances in Neural Question Generation , 2019, ArXiv.

[25]  Kilian Q. Weinberger,et al.  BERTScore: Evaluating Text Generation with BERT , 2019, ICLR.

[26]  Pamela R. Cangelosi,et al.  Putting Socrates back in Socratic method: Theory-based debriefing in the nursing classroom. , 2019, Nursing philosophy : an international journal for healthcare professionals.

[27]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[28]  Yoshua Bengio,et al.  HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering , 2018, EMNLP.

[29]  Alan W. Black,et al.  A Dataset for Document Grounded Conversations , 2018, EMNLP.

[30]  B. Inkster,et al.  An Empathy-Driven, Conversational Artificial Intelligence Agent (Wysa) for Digital Mental Well-Being: Real-World Data Evaluation Mixed-Methods Study , 2018, JMIR mHealth and uHealth.

[31]  Jonathan Sims,et al.  Alleviating the Plunging-In Bias, Elevating Strategic Problem-Solving , 2018, Academy of Management Learning & Education.

[32]  Mitesh M. Khapra,et al.  Towards a Better Metric for Evaluating Question Generation Systems , 2018, EMNLP.

[33]  Danqi Chen,et al.  CoQA: A Conversational Question Answering Challenge , 2018, TACL.

[34]  Eunsol Choi,et al.  QuAC: Question Answering in Context , 2018, EMNLP.

[35]  Hal Daumé,et al.  Learning to Ask Good Questions: Ranking Clarification Questions using Neural Expected Value of Perfect Information , 2018, ACL.

[36]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[37]  Jonathan Berant,et al.  Evaluating Semantic Parsing against a Simple Web-based Question Answering Model , 2017, *SEMEVAL.

[38]  Muhammad Abdul-Mageed,et al.  EmoNet: Fine-Grained Emotion Detection with Gated Recurrent Neural Networks , 2017, ACL.

[39]  K. Fitzpatrick,et al.  Delivering Cognitive Behavior Therapy to Young Adults With Symptoms of Depression and Anxiety Using a Fully Automated Conversational Agent (Woebot): A Randomized Controlled Trial , 2017, JMIR mental health.

[40]  Ankur Taly,et al.  Axiomatic Attribution for Deep Networks , 2017, ICML.

[41]  Philip Bachman,et al.  NewsQA: A Machine Comprehension Dataset , 2016, Rep4NLP@ACL.

[42]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.

[43]  Kevin Crowston,et al.  Amazon Mechanical Turk: A Research Tool for Organizations and Information Systems Scholars , 2012, Shaping the Future of ICT Research.

[44]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[45]  Cristian Danescu-Niculescu-Mizil,et al.  Chameleons in Imagined Conversations: A New Approach to Understanding Coordination of Linguistic Style in Dialogs , 2011, CMCL@ACL.

[46]  M. Neenan Using Socratic Questioning in Coaching , 2009 .

[47]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[48]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[49]  Chin-Yew Lin,et al.  ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.

[50]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[51]  D. Krathwohl A Revision of Bloom's Taxonomy: An Overview , 2002 .

[52]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[53]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[54]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[55]  A. Tversky,et al.  Judgment under Uncertainty: Heuristics and Biases , 1974, Science.

[56]  Jacob Cohen Statistical Power Analysis for the Behavioral Sciences , 1969, The SAGE Encyclopedia of Research Design.

[57]  P. Pu,et al.  A Taxonomy of Empathetic Questions in Social Dialogs , 2022, ACL.

[58]  Joakim Nivre,et al.  Fine-Grained Controllable Text Generation Using Non-Residual Prompting , 2022, ACL.

[59]  Chao-Yi Lu,et al.  A Survey of Approaches to Automatic Question Generation:from 2019 to Early 2021 , 2021, ROCLING.

[60]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[61]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .

[62]  Yao-Chung Fan,et al.  A Recurrent BERT-based Model for Question Generation , 2019, EMNLP.

[63]  Lowell B Bautista,et al.  The Socratic Method as a Pedagogical Method in Legal Education , 2014 .

[64]  A. Fisher,et al.  Critical Thinking : What Every Person Needs to Survive in a Rapidly Changing World , 2007 .

[65]  J. R. Mosig,et al.  Are They Related , 2006 .

[66]  P. Delin,et al.  What is an assumption , 1995 .

[67]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .