Feature Scoring using Tree-Based Ensembles for Evolving Data Streams

Assigning scores to individual features is a popular method for estimating the relevance of features in supervised learning. An accurate feature score estimation provides essential insights in sensitive domains, which is decisive to explain how features influence a given decision, contributing to the interpretability of the model. Learning from streaming data adds several challenges to machine learning tasks, including limited resources and changes to the underlying data distribution (i.e., evolving data streams). In this work, we introduce and analyze methods to efficiently estimate the Mean Decrease in Impurity (MDI) and COVER measures using ensembles of incremental decision trees. To achieve current scores in evolving data streams, we employ tree-ensembles that incorporate active drift detection. Experimental results show how MDI and COVER can be used to track the feature scores when their importance to the ensemble model shift over time. On top of that, we present the impact on the feature scores when the learning problem includes a non-negligible verification latency for the arrival of the labels. We also present a counter-intuitive experiment using a standard benchmark dataset where the feature scores correctly illustrate the importance of two features to the ensemble model. However, these features are prioritized due to biased split decisions, and in their absence, the model increases in predictive performance. We conclude that the presented measures can be used to understand the impact of features in the ensemble model better, still, such measures should be used with caution as they are limited by the underlying tree building and ensemble model biases.

[1]  Rodrigo Fernandes de Mello,et al.  Discriminating seismic events of the Llaima volcano (Chile) based on spectrogram cross-correlations , 2018, Journal of Volcanology and Geothermal Research.

[2]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[3]  Richard A. Berk,et al.  Overview of: “Statistical Procedures for Forecasting Criminal Behavior: A Comparative Assessment” , 2013 .

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Ricard Gavaldà,et al.  Learning from Time-Changing Data with Adaptive Windowing , 2007, SDM.

[6]  Geoff Holmes,et al.  Stress-Testing Hoeffding Trees , 2005, PKDD.

[7]  Richard A. Berk,et al.  Statistical Procedures for Forecasting Criminal Behavior , 2013 .

[8]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[9]  Geoffrey I. Webb,et al.  Extremely Fast Decision Tree , 2018, KDD.

[10]  Geoff Holmes,et al.  Leveraging Bagging for Evolving Data Streams , 2010, ECML/PKDD.

[11]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[12]  Thomas Lengauer,et al.  Permutation importance: a corrected feature importance measure , 2010, Bioinform..

[13]  Rodrigo Fernandes de Mello,et al.  Multidimensional surrogate stability to detect data stream concept drift , 2017, Expert Syst. Appl..

[14]  Li Wan,et al.  Heterogeneous Ensemble for Feature Drifts in Data Streams , 2012, PAKDD.

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[17]  Jean Paul Barddal,et al.  On Dynamic Feature Weighting for Feature Drifting Data Streams , 2016, ECML/PKDD.

[18]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[19]  J. A. Stewart,et al.  Nonlinear Time Series Analysis , 2015 .

[20]  Stuart J. Russell,et al.  Online bagging and boosting , 2005, 2005 IEEE International Conference on Systems, Man and Cybernetics.

[21]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[22]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[23]  Brandon M. Greenwell,et al.  Interpretable Machine Learning , 2019, Hands-On Machine Learning with R.

[24]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[25]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[26]  Jean Paul Barddal,et al.  A Survey on Ensemble Learning for Data Stream Classification , 2017, ACM Comput. Surv..

[27]  Denis J. Dean,et al.  Comparative accuracies of artificial neural networks and discriminant analysis in predicting forest cover types from cartographic variables , 1999 .

[28]  Andrew Phelps Cassidy,et al.  Calculating feature importance in data streams with concept drift using Online Random Forest , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[29]  Gilles Louppe,et al.  Understanding variable importances in forests of randomized trees , 2013, NIPS.

[30]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[31]  Horst Bischof,et al.  On-line Random Forests , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[32]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[33]  Albert Bifet,et al.  Efficient Online Evaluation of Big Data Stream Classifiers , 2015, KDD.

[34]  Wu He,et al.  Internet of Things in Industries: A Survey , 2014, IEEE Transactions on Industrial Informatics.