Learning with rationales for document classification

We present a simple and yet effective approach for document classification to incorporate rationales elicited from annotators into the training of any off-the-shelf classifier. We empirically show on several document classification datasets that our classifier-agnostic approach, which makes no assumptions about the underlying classifier, can effectively incorporate rationales into the training of multinomial naïve Bayes, logistic regression, and support vector machines. In addition to being classifier-agnostic, we show that our method has comparable performance to previous classifier-specific approaches developed for incorporating rationales and feature annotations. Additionally, we propose and evaluate an active learning method tailored specifically for the learning with rationales framework.

[1]  Christine D. Piatko,et al.  Using “Annotator Rationales” to Improve Machine Learning for Text Categorization , 2007, NAACL.

[2]  Jeff Donahue,et al.  Annotator rationales for visual recognition , 2011, 2011 International Conference on Computer Vision.

[3]  Glenn Fung,et al.  Knowledge-Based Support Vector Machine Classifiers , 2002, NIPS.

[4]  Alexander Zien,et al.  Semi-Supervised Learning , 2006 .

[5]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[6]  Weng-Keen Wong,et al.  End-user feature labeling: Supervised and semi-supervised approaches based on locally-weighted logistic regression , 2013, Artif. Intell..

[7]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[8]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[9]  Vikas Sindhwani,et al.  Uncertainty sampling and transductive experimental design for active dual supervision , 2009, ICML '09.

[10]  Weng-Keen Wong,et al.  End-user feature labeling: a locally-weighted regression approach , 2011, IUI '11.

[11]  W. Cleveland,et al.  Locally Weighted Regression: An Approach to Regression Analysis by Local Fitting , 1988 .

[12]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[13]  Thomas G. Dietterich,et al.  Toward harnessing user feedback for machine learning , 2007, IUI '07.

[14]  Jason Eisner,et al.  Machine Learning with Annotator Rationales to Reduce Annotation Cost , 2008 .

[15]  Richard Segal,et al.  Fast Uncertainty Sampling for Labeling Large E-mail Corpora , 2006, CEAS.

[16]  Andrew McCallum,et al.  Toward Optimal Active Learning through Sampling Estimation of Error Reduction , 2001, ICML.

[17]  Hema Raghavan,et al.  Active Learning with Feedback on Features and Instances , 2006, J. Mach. Learn. Res..

[18]  Manali Sharma,et al.  Active Learning with Rationales for Text Classification , 2015, NAACL.

[19]  Manali Sharma,et al.  Most-Surely vs. Least-Surely Uncertain , 2013, 2013 IEEE 13th International Conference on Data Mining.

[20]  Foster J. Provost,et al.  A Unified Approach to Active Dual Supervision for Labeling Features and Examples , 2010, ECML/PKDD.

[21]  Jingbo Zhu,et al.  Active Learning for Word Sense Disambiguation with Methods for Addressing the Class Imbalance Problem , 2007, EMNLP.

[22]  Thomas G. Dietterich,et al.  Interacting meaningfully with machine learning systems: Three experiments , 2009, Int. J. Hum. Comput. Stud..

[23]  F. Girosi,et al.  Prior knowledge and the creation of "virtual" examples for RBF networks , 1995, Proceedings of 1995 IEEE Workshop on Neural Networks for Signal Processing.

[24]  Prem Melville,et al.  Sentiment analysis of blogs by combining lexical knowledge with text classification , 2009, KDD.

[25]  Bernhard Schölkopf,et al.  Introduction to Semi-Supervised Learning , 2006, Semi-Supervised Learning.

[26]  Andrew McCallum,et al.  Active Learning by Labeling Features , 2009, EMNLP.

[27]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[28]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[29]  Devi Parikh,et al.  Attributes for Classifier Feedback , 2012, ECCV.

[30]  James Allan,et al.  An interactive algorithm for asking and incorporating feature feedback into support vector machines , 2007, SIGIR.

[31]  Vikas Sindhwani,et al.  Active Dual Supervision: Reducing the Cost of Annotating Examples and Features , 2009, HLT-NAACL 2009.

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[34]  Maria Eugenia Ramirez-Loaiza,et al.  Active learning: an empirical study of common baselines , 2017, Data Mining and Knowledge Discovery.

[35]  Isabelle Guyon,et al.  Results of the Active Learning Challenge , 2011, Active Learning and Experimental Design @ AISTATS.

[36]  Jude Shavlik,et al.  Refinement ofApproximate Domain Theories by Knowledge-Based Neural Networks , 1990, AAAI.

[37]  Carla E. Brodley,et al.  The Constrained Weight Space SVM: Learning with Ranked Features , 2011, ICML.

[38]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.