Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking

Machine-learned models are often described as "black boxes". In many real-world applications however, models may have to sacrifice predictive power in favour of human-interpretability. When this is the case, feature engineering becomes a crucial task, which requires significant and time-consuming human effort. Whilst some features are inherently static, representing properties that cannot be influenced (e.g., the age of an individual), others capture characteristics that could be adjusted (e.g., the daily amount of carbohydrates taken). Nonetheless, once a model is learned from the data, each prediction it makes on new instances is irreversible - assuming every instance to be a static point located in the chosen feature space. There are many circumstances however where it is important to understand (i) why a model outputs a certain prediction on a given instance, (ii) which adjustable features of that instance should be modified, and finally (iii) how to alter such a prediction when the mutated instance is input back to the model. In this paper, we present a technique that exploits the internals of a tree-based ensemble classifier to offer recommendations for transforming true negative instances into positively predicted ones. We demonstrate the validity of our approach using an online advertising application. First, we design a Random Forest classifier that effectively separates between two types of ads: low (negative) and high (positive) quality ads (instances). Then, we introduce an algorithm that provides recommendations that aim to transform a low quality ad (negative instance) into a high quality one (positive instance). Finally, we evaluate our approach on a subset of the active inventory of a large ad network, Yahoo Gemini.

[1]  J. Friedman Stochastic gradient boosting , 2002 .

[2]  Wynne Hsu,et al.  Post-Analysis of Learned Rules , 1996, AAAI/IAAI, Vol. 1.

[3]  Wynne Hsu,et al.  Pruning and summarizing the discovered associations , 1999, KDD '99.

[4]  Chih-Jen Lin,et al.  Dual coordinate descent methods for logistic regression and maximum entropy models , 2011, Machine Learning.

[5]  Qiang Yang,et al.  Postprocessing decision trees to extract actionable knowledge , 2003, Third IEEE International Conference on Data Mining.

[6]  Ke Zhou,et al.  Predicting Pre-click Quality for Native Advertisements , 2016, WWW.

[7]  Chengqi Zhang,et al.  Flexible Frameworks for Actionable Knowledge Discovery , 2010, IEEE Transactions on Knowledge and Data Engineering.

[8]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[9]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[10]  Qiang Yang,et al.  Extracting Actionable Knowledge from Decision Trees , 2007, IEEE Transactions on Knowledge and Data Engineering.

[11]  Mei Liu,et al.  Efficient Action Extraction with Many-to-Many Relationship between Actions and Features , 2011, LORI.

[12]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[13]  Chengqi Zhang,et al.  Knowledge actionability: satisfying technical and business interestingness , 2007, Int. J. Bus. Intell. Data Min..

[14]  Hema Raghavan,et al.  A relevance model based filter for improving ad quality , 2009, SIGIR.

[15]  Chengqi Zhang,et al.  Domain-Driven Actionable Knowledge Discovery in the Real World , 2006, PAKDD.

[16]  Rashedur M. Rahman,et al.  Decision Tree and Naïve Bayes Algorithm for Classification and Generation of Actionable Knowledge for Direct Marketing , 2013 .

[17]  J. Cornell Introductory Mathematical Statistics: Principles and Methods , 1970 .

[18]  Neil Daswani,et al.  The Anatomy of Clickbot.A , 2007, HotBots.

[19]  Sameer Singh,et al.  “Why Should I Trust You?”: Explaining the Predictions of Any Classifier , 2016, NAACL.

[20]  B. Matthews Comparison of the predicted and observed secondary structure of T4 phage lysozyme. , 1975, Biochimica et biophysica acta.

[21]  Howard J. Hamilton,et al.  Applying Objective Interestingness Measures in Data Mining Systems , 2000, PKDD.

[22]  Carlos Guestrin,et al.  Model-Agnostic Interpretability of Machine Learning , 2016, ArXiv.

[23]  Christopher Krügel,et al.  Understanding fraudulent activities in online ad exchanges , 2011, IMC '11.

[24]  Yixin Chen,et al.  Optimal Action Extraction for Random Forests and Boosted Trees , 2015, KDD.

[25]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[26]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[27]  Fabrizio Silvestri,et al.  Improving Post-Click User Engagement on Native Ads via Survival Analysis , 2016, WWW.

[28]  Fabrizio Silvestri,et al.  Promoting Positive Post-Click Experience for In-Stream Yahoo Gemini Users , 2015, KDD.