EXPLAINING DATA-DRIVEN DOCUMENT CLASSIFICATIONS 1

Many document classification applications require human understanding of the reasons for data-driven classification decisions by managers, client-facing employees, and the technical team. Predictive models treat documents as data to be classified, and document data are characterized by very high dimensionality, often with tens of thousands to millions of variables (words). Unfortunately, due to the high dimensionality, understanding the decisions made by document classifiers is very difficult. This paper begins by extending the most relevant prior theoretical model of explanations for intelligent systems to account for some missing elements. The main theoretical contribution is the definition of a new sort of explanation as a minimal set of words (terms, generally), such that removing all words within this set from the document changes the predicted class from the class of interest. We present an algorithm to find such explanations, as well as a framework to assess such an algorithm’s performance. We demonstrate the value of the new approach with a case study from a real-world document classification task: classifying web pages as containing objectionable content, with the goal of allowing advertisers to choose not to have their ads appear on those pages. A second empirical demonstration on news-story topic classification shows the explanations to be concise and document-specific, and to be capable of providing understanding of the exact reasons for the classification decisions, of the workings of the classification models, and of the business application itself. We also illustrate how explaining the classifications of documents can help to improve data quality and model performance.

[1]  Gerrit van Bruggen,et al.  How Incorporating Feedback Mechanisms in a DSS Affects DSS Evaluations , 2009, Inf. Syst. Res..

[2]  Jude W. Shavlik,et al.  in Advances in Neural Information Processing , 1996 .

[3]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[4]  Martin Bichler,et al.  Design science in information systems research , 2006, Wirtschaftsinf..

[5]  Izak Benbasat,et al.  Explanations From Intelligent Systems: Theoretical Foundations and Implications for Practice , 1999, MIS Q..

[6]  Foster Provost,et al.  Audience selection for on-line brand advertising: privacy-friendly social network targeting , 2009, KDD.

[7]  Robert J. Kauffman,et al.  50th Anniversary Article: The Evolution of Research on Information Systems: A Fiftieth-Year Survey of the Literature in Management Science , 2004, Manag. Sci..

[8]  Mark S. Silver,et al.  Decisional Guidance for Computer-Based Decision Support , 1991, MIS Q..

[9]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[10]  Bart Baesens,et al.  Comprehensible Credit Scoring Models Using Rule Extraction from Support Vector Machines , 2007, Eur. J. Oper. Res..

[11]  David Arnott,et al.  Cognitive biases and decision support systems development: a design science approach , 2006, Inf. Syst. J..

[12]  Henri Barki,et al.  Interpersonal Conflict and Its Management in Information System Development , 2001, MIS Q..

[13]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[14]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[15]  Erik Strumbelj,et al.  Explaining instance classifications with interactions of subsets of feature values , 2009, Data Knowl. Eng..

[16]  David Martens,et al.  Pseudo-Social Network Targeting from Consumer Transaction Data , 2011 .

[17]  Marko Robnik-Sikonja,et al.  Explaining Classifications For Individual Instances , 2008, IEEE Transactions on Knowledge and Data Engineering.

[18]  Vikas Sindhwani,et al.  Document-Word Co-regularization for Semi-supervised Sentiment Analysis , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Vicky Arnold,et al.  The Differential Use and Effect of Knowledge-Based System Explanations in Novice and Expert Judgement Decisions , 2006, MIS Q..

[20]  Foster J. Provost,et al.  Design principles of massive, robust prediction systems , 2012, KDD.

[21]  Yuen-Hsien Tseng,et al.  Text mining techniques for patent analysis , 2007, Inf. Process. Manag..

[22]  Tom Fawcett,et al.  Adaptive Fraud Detection , 1997, Data Mining and Knowledge Discovery.

[23]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery: An Overview , 1996, Advances in Knowledge Discovery and Data Mining.

[24]  Izak Benbasat,et al.  Evaluating the Impact of DSS, Cognitive Effort, and Incentives on Strategy Selection , 1999, Inf. Syst. Res..

[25]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[26]  Bart Baesens,et al.  Benchmarking Classification Models for Software Defect Prediction: A Proposed Framework and Novel Findings , 2008, IEEE Transactions on Software Engineering.

[27]  Johan A. K. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring , 2003, J. Oper. Res. Soc..

[28]  Gerardine DeSanctis,et al.  Providing Decisional Guidance for Multicriteria Decision Making in Groups , 2000, Inf. Syst. Res..

[29]  Anna Sidorova,et al.  Uncovering the Intellectual Core of the Information Systems Discipline , 2008, MIS Q..

[30]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[31]  Motoaki Kawanabe,et al.  How to Explain Individual Classification Decisions , 2009, J. Mach. Learn. Res..

[32]  Edward H. Shortliffe,et al.  Rule Based Expert Systems: The Mycin Experiments of the Stanford Heuristic Programming Project (The Addison-Wesley series in artificial intelligence) , 1984 .

[33]  Iris Vessey,et al.  Multiattribute Data Presentation and Human Judgment: A Cognitive Fit Perspective* , 1994 .

[34]  Gerardine De Sanctis,et al.  Expectancy Theory as an Explanation of Voluntary Use of a Decision-Support System: , 1983 .

[35]  Padmini Srinivasan,et al.  Learning to crawl: Comparing classification schemes , 2005, TOIS.

[36]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[37]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[38]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[39]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[40]  Brian D. Davison,et al.  Web page classification: Features and algorithms , 2009, CSUR.

[41]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.

[42]  Marshall Scott Poole,et al.  The Effects of Variations in Capabilities of GDSS Designs on Management of Cognitive Conflict in Groups , 1992, Inf. Syst. Res..

[43]  Carla E. Brodley,et al.  Semi-automated screening of biomedical citations for systematic reviews , 2010, BMC Bioinformatics.

[44]  Panagiotis G. Ipeirotis,et al.  Beat the Machine: Challenging Workers to Find the Unknown Unknowns , 2011, Human Computation.

[45]  Michael Lawrence,et al.  The effects of structural characteristics of explanations on use of a DSS , 2006, Decis. Support Syst..

[46]  Gerrit van Bruggen,et al.  DSS Effectiveness in Marketing Resource Allocation Decisions: Reality vs. Perception , 2004, Inf. Syst. Res..

[47]  L. Richard Ye,et al.  The Impact of Explanation Facilities in User Acceptance of Expert System Advice , 1995, MIS Q..

[48]  Erik Strumbelj,et al.  An Efficient Explanation of Individual Classifications using Game Theory , 2010, J. Mach. Learn. Res..

[49]  Alexander J. Smola,et al.  Collaborative Email-Spam Filtering with the Hashing-Trick , 2009 .

[50]  Foster J. Provost,et al.  Why label when you can search?: alternatives to active learning for applying human resources to build classification models under extreme class imbalance , 2010, KDD.

[51]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[52]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[53]  Galit Shmueli,et al.  Predictive Analytics in Information Systems Research , 2010, MIS Q..