Enhancing Classification of Ecological Momentary Assessment Data Using Bagging and Boosting

Ecological Momentary Assessment (EMA) techniques gain more ground in studies and data collection among different disciplines. Decision tree algorithms and their ensemble variants are widely used for classifying this type of data, since they are easy to use and provide satisfactory results. However, most of these algorithms do not take into account the multiple levels (per-subject, per-day, etc.) in which EMA data are organized. In this paper we explore how the EMA data organization can be taken into account when dealing with decision trees and specifically how a combination of bagging and boosting can be utilized in a classification task. A new algorithm called BBT (standing for Bagged Boosted Trees) is proposed which is enhanced by an over/under sampling method leading to better estimates of the conditional class probability function. BBT's necessity and effects are demonstrated using both simulated datasets and real-world EMA data collected using a mobile application following the eating behavior of 100 people. Experimental analysis shows that BBT leads to clear improvements with respect to prediction error reduction and conditional class probability estimation.

[1]  P. Bickel,et al.  Some Theory for Generalized Boosting Algorithms , 2006, J. Mach. Learn. Res..

[2]  Jeffrey S. Simonoff,et al.  Unbiased Regression Trees for Longitudinal and Clustered Data , 2014, Comput. Stat. Data Anal..

[3]  S. Shiffman,et al.  Ecological momentary assessment. , 2008, Annual review of clinical psychology.

[4]  Jianjun Xie,et al.  A Combination of Boosting and Bagging for KDD Cup 2009 - Fast Scoring on a Large Database , 2009, KDD Cup.

[5]  Rich Caruana,et al.  On Feature Selection, Bias-Variance, and Bagging , 2009, ECML/PKDD.

[6]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[7]  Gian Luca Foresti,et al.  A neural tree for classification using convex objective function , 2015, Pattern Recognit. Lett..

[8]  Panayiotis E. Pintelas,et al.  Combining Bagging and Boosting , 2007 .

[9]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[10]  G. De’ath Boosted trees for ecological modeling and prediction. , 2007, Ecology.

[11]  A. Buja,et al.  Loss Functions for Binary Class Probability Estimation and Classification: Structure and Applications , 2005 .

[12]  P. Bühlmann,et al.  Boosting with the L2-loss: regression and classification , 2001 .

[13]  Raymond J. Mooney,et al.  Combining Bias and Variance Reduction Techniques for Regression Trees , 2005, ECML.

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  W. Loh,et al.  Regression trees for longitudinal and multiresponse data , 2012, 1209.4690.

[16]  Berthold Lausen,et al.  Bootstrap estimated true and false positive rates and ROC curve , 2008 .

[17]  E. Diener,et al.  Experience Sampling: Promises and Pitfalls, Strengths and Weaknesses , 2003 .

[18]  J Elith,et al.  A working guide to boosted regression trees. , 2008, The Journal of animal ecology.

[19]  David Mease Cost-Weighted Boosting with Jittering and Over / Under-Sampling : JOUS-Boost , 2004 .

[20]  Jeffrey S. Simonoff,et al.  RE-EM trees: a data mining approach for longitudinal and clustered data , 2011, Machine Learning.

[21]  Y. Freund,et al.  Discussion of the Paper \additive Logistic Regression: a Statistical View of Boosting" By , 2000 .

[22]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[23]  Gerjo Kok,et al.  Monitoring Dietary Intake and Physical Activity Electronically: Feasibility, Usability, and Ecological Validity of a Mobile-Based Ecological Momentary Assessment Tool , 2013, Journal of medical Internet research.

[24]  Panagiotis G. Ipeirotis,et al.  The Dimensions of Reputation in Electronic Markets , 2009 .

[25]  J. de Leeuw,et al.  Prediction in Multilevel Models , 2005 .

[26]  David Mease,et al.  Boosted Classification Trees and Class Probability/Quantile Estimation , 2007, J. Mach. Learn. Res..

[27]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[28]  Berthold Lausen,et al.  Classification of repeated measurements data using tree-based ensemble methods , 2011, Comput. Stat..

[29]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[30]  Gerhard Weiss,et al.  Network Analysis of Ecological Momentary Assessment Data for Monitoring and Understanding Eating Behavior , 2015, ICSH.

[31]  C. Sutton Classification and Regression Trees, Bagging, and Boosting , 2005 .

[32]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[33]  R. Tibshirani,et al.  Generalized Additive Models , 1991 .