Mixed-Initiative Active Learning for Generating Linguistic Insights in Question Classification

We propose a mixed-initiative active learning system to tackle the challenge of building descriptive models for under-studied linguistic phenomena. Our particular use case is the linguistic analysis of question types, in particular in understanding what characterizes information-seeking vs. non-information-seeking questions (i.e., whether the speaker wants to elicit an answer from the hearer or not) and how automated methods can assist with the linguistic analysis. Our approach is motivated by the need for an effective and efficient human-in-the-loop process in natural language processing that relies on example-based learning and provides immediate feedback to the user. In addition to the concrete implementation of a question classification system, we describe general paradigms of explainable mixed-initiative learning, allowing for the user to access the patterns identified automatically by the system, rather than being confronted by a machine learning black box. Our user study demonstrates the capability of our system in providing deep linguistic insight into this particular analysis problem. The results of our evaluation are competitive with the current state-of-the-art.

[1]  James R. Faeder,et al.  Automated Visualization of Rule-based Models , 2016 .

[2]  Michael Granitzer,et al.  User-Based Active Learning , 2010, 2010 IEEE International Conference on Data Mining Workshops.

[3]  Min Tang,et al.  Active Learning for Statistical Natural Language Parsing , 2002, ACL.

[4]  Minsuk Kahng,et al.  Visual Analytics in Deep Learning: An Interrogative Survey for the Next Frontiers , 2018, IEEE Transactions on Visualization and Computer Graphics.

[5]  Yao Sun,et al.  RuleBender: Integrated visualization for biochemical rule-based modeling , 2011, 2011 IEEE Symposium on Biological Data Visualization (BioVis)..

[6]  Marco Hutter,et al.  Comparing Visual-Interactive Labeling with Active Learning: An Experimental Study , 2018, IEEE Transactions on Visualization and Computer Graphics.

[7]  Miles Efron,et al.  Questions are content: A taxonomy of questions in a microblogging environment , 2010, ASIST.

[8]  Joseph E. Beck,et al.  Naive Bayes Classifiers for User Modeling , 1999 .

[9]  Raymond J. Mooney,et al.  Active Learning for Natural Language Parsing and Information Extraction , 1999, ICML.

[10]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[11]  Ed H. Chi,et al.  What is a Question? Crowdsourcing Tweet Categorization , 2011 .

[12]  Daniel A. Keim,et al.  NEREx: Named‐Entity Relationship Exploration in Multi‐Party Conversations , 2017, Comput. Graph. Forum.

[13]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[14]  Eric Horvitz,et al.  Principles of mixed-initiative user interfaces , 1999, CHI '99.

[15]  David A. Landgrebe,et al.  A survey of decision tree classifier methodology , 1991, IEEE Trans. Syst. Man Cybern..

[16]  Huan Liu,et al.  Identifying Rhetorical Questions in Social Media , 2021, ICWSM.

[17]  ALMUTH GRÉSILLON ZUM LINGUISTISCHEN STATUS RHETORISCHER FRAGEN , 1980 .

[18]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[19]  Jian Su,et al.  Multi-Criteria-based Active Learning for Named Entity Recognition , 2004, ACL.

[20]  Burr Settles,et al.  Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances , 2011, EMNLP.

[21]  Miriam Butt,et al.  A Multilingual Approach to Question Classification , 2018, LREC.

[22]  H. Sebastian Seung,et al.  Selective Sampling Using the Query by Committee Algorithm , 1997, Machine Learning.

[23]  Kai Wang,et al.  Exploiting Salient Patterns for Question Detection and Question Retrieval in Community-based Question Answering , 2010, COLING.

[24]  T. Gonen,et al.  Questions , 1927, Journal of Family Planning and Reproductive Health Care.

[25]  Thomas Ertl,et al.  Visual Classifier Training for Text Document Retrieval , 2012, IEEE Transactions on Visualization and Computer Graphics.

[26]  Saied Ali Meer Loohi أ.د. سيد علي مير لوحي Identification, description and interpretation of English rhetorical questions in political speeches , 2009 .

[27]  Jan Komorowski,et al.  Ciruvis: a web-based tool for rule networks and interaction detection using rule-based classifiers , 2014, BMC Bioinformatics.

[28]  Daniel A. Keim,et al.  ThreadReconstructor: Modeling Reply‐Chains to Untangle Conversational Text through Visual Analytics , 2018, Comput. Graph. Forum.

[29]  Daniel A. Keim,et al.  Progressive Learning of Topic Modeling Parameters: A Visual Analytics Framework , 2018, IEEE Transactions on Visualization and Computer Graphics.

[30]  Weiwei Cui,et al.  Overview of Text Visualization Techniques , 2016 .

[31]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[32]  Daniel A. Keim,et al.  Bridging Text Visualization and Mining: A Task-Driven Survey , 2019, IEEE Transactions on Visualization and Computer Graphics.

[33]  Edward Y. Chang,et al.  Question identification on twitter , 2011, CIKM '11.

[34]  Rajesh Bhatt,et al.  Argument-Adjunct Asymmetries in Rhetorical Questions , 1998 .

[35]  Xiting Wang,et al.  Towards better analysis of machine learning models: A visual analytics perspective , 2017, Vis. Informatics.

[36]  Zhe Zhao,et al.  Questions about questions: an empirical analysis of information needs on Twitter , 2013, WWW.

[37]  Matthias Scheutz,et al.  Parallel Syntactic Annotation in CReST , 2012 .

[38]  F. Maxwell Harper,et al.  Facts or friends?: distinguishing informational and conversational questions in social Q&A sites , 2009, CHI.

[39]  Mark Steedman,et al.  Example Selection for Bootstrapping Statistical Parsers , 2003, NAACL.