Using ensembles for problems with characterizable changes in data distribution: A case study on quantification

Our hypothesis: ensembles are well suited for problems with distribution changes.If those changes are characterizable, ensembles can be designed to tackle them.Idea: to generate different samples based on the expected distribution changes.Case study: we present ensembles versions of two binary quantification algorithms.Ensembles outperform original counterpart algorithms using trivial aggregation rules. Display Omitted Ensemble methods are widely applied to supervised learning tasks. Based on a simple strategy they often achieve good performance, especially when the single models comprising the ensemble are diverse. Diversity can be introduced into the ensemble by creating different training samples for each model. In that case, each model is trained with a data distribution that may be different from the original training set distribution. Following that idea, this paper analyzes the hypothesis that ensembles can be especially appropriate in problems that: (i) suffer from distribution changes, (ii) it is possible to characterize those changes beforehand. The idea consists in generating different training samples based on the expected distribution changes, and to train one model with each of them. As a case study, we shall focus on binary quantification problems, introducing ensembles versions for two well-known quantification algorithms. Experimental results show that these ensemble adaptations outperform the original counterpart algorithms, even when trivial aggregation rules are used.

[1]  Žliobait . e,et al.  Learning under Concept Drift: an Overview , 2010 .

[2]  Lawrence O. Hall,et al.  Ensemble diversity measures and their application to thinning , 2004, Inf. Fusion.

[3]  George Forman,et al.  Counting Positives Accurately Despite Inaccurate Classification , 2005, ECML.

[4]  Huan Liu,et al.  Network quantification despite biased labels , 2010, MLG '10.

[5]  Peter A. Flach,et al.  A Response to Webb and Ting’s On the Application of ROC Analysis to Predict Classification Performance Under Varying Class Distributions , 2005, Machine Learning.

[6]  Francisco Herrera,et al.  A unifying view on dataset shift in classification , 2012, Pattern Recognit..

[7]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[8]  Fabrizio Sebastiani,et al.  Quantification Trees , 2013, 2013 IEEE 13th International Conference on Data Mining.

[9]  Juan José del Coz,et al.  On the study of nearest neighbor algorithms for prevalence estimation in binary problems , 2013, Pattern Recognit..

[10]  Hamid Beigy,et al.  Using a classifier pool in accuracy based tracking of recurring concepts in data stream classification , 2013, Evol. Syst..

[11]  Ludmila I. Kuncheva,et al.  Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy , 2003, Machine Learning.

[12]  Amos Storkey,et al.  When Training and Test Sets are Different: Characterising Learning Transfer , 2013 .

[13]  G DietterichThomas An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees , 2000 .

[14]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[15]  Juan José del Coz,et al.  Quantification-oriented learning based on reliable classifiers , 2015, Pattern Recognit..

[16]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[17]  Eamonn J. Keogh,et al.  MDL-based time series clustering , 2012, Knowledge and Information Systems.

[18]  J. Gart,et al.  Comparison of a screening test and a reference test in epidemiologic studies. II. A probabilistic model for the comparison of diagnostic tests. , 1966, American journal of epidemiology.

[19]  George Forman,et al.  Pragmatic text mining: minimizing human effort to quantify many issues in call logs , 2006, KDD '06.

[20]  Enrique Alegre,et al.  Class distribution estimation based on the Hellinger distance , 2013, Inf. Sci..

[21]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[22]  D. Opitz,et al.  Popular Ensemble Methods: An Empirical Study , 1999, J. Artif. Intell. Res..

[23]  Fabio Roli,et al.  A theoretical and experimental analysis of linear combiners for multiple classifier systems , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  José Hernández-Orallo,et al.  Quantification via Probability Estimators , 2010, 2010 IEEE International Conference on Data Mining.

[25]  Kenneth O. Stanley Learning Concept Drift with a Committee of Decision Trees , 2003 .

[26]  Andrea Esuli,et al.  AI and Opinion Mining, Part 2 , 2010, IEEE Intelligent Systems.

[27]  Neil D. Lawrence,et al.  Dataset Shift in Machine Learning , 2009 .

[28]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[29]  Ludmila I. Kuncheva,et al.  Classifier Ensembles for Changing Environments , 2004, Multiple Classifier Systems.

[30]  Ralf Klinkenberg,et al.  Boosting classifiers for drifting concepts , 2007, Intell. Data Anal..

[31]  Robi Polikar,et al.  Learning concept drift in nonstationary environments using an ensemble of classifiers based approach , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[32]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[33]  Mykola Pechenizkiy,et al.  Dynamic integration of classifiers for handling concept drift , 2008, Inf. Fusion.

[34]  Xin Yao,et al.  Diversity creation methods: a survey and categorisation , 2004, Inf. Fusion.

[35]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[36]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[37]  Geoffrey I. Webb,et al.  Characterizing concept drift , 2015, Data Mining and Knowledge Discovery.

[38]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[39]  Gerhard Widmer,et al.  Learning in the Presence of Concept Drift and Hidden Contexts , 1996, Machine Learning.

[40]  George Forman,et al.  Quantifying counts and costs via classification , 2008, Data Mining and Knowledge Discovery.

[41]  Víctor González-Castro,et al.  Classification and Quantification Based on Image Analysis for Sperm Samples with Uncertain Damaged/Intact Cell Proportions , 2008, ICIAR.

[42]  Wei Gao,et al.  Tweet sentiment: From classification to quantification , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[43]  Xin Yao,et al.  The Impact of Diversity on Online Ensemble Learning in the Presence of Concept Drift , 2010, IEEE Transactions on Knowledge and Data Engineering.

[44]  David J. Hand,et al.  Classifier Technology and the Illusion of Progress , 2006, math/0606441.