MetaStream: A meta-learning based method for periodic algorithm selection in time-changing data

Dynamic real-world applications that generate data continuously have introduced new challenges for the machine learning community, since the concepts to be learned are likely to change over time. In such scenarios, an appropriate model at a time point may rapidly become obsolete, requiring updating or replacement. As there are several learning algorithms available, choosing one whose bias suits the current data best is not a trivial task. In this paper, we present a meta-learning based method for periodic algorithm selection in time-changing environments, named MetaStream. It works by mapping the characteristics extracted from the past and incoming data to the performance of regression models in order to choose between single learning algorithms or their combination. Experimental results for two real regression problems showed that MetaStream is able to improve the general performance of the learning system compared to a baseline method and an ensemble-based approach.

[1]  Ricardo Vilalta,et al.  A Perspective View and Survey of Meta-Learning , 2002, Artificial Intelligence Review.

[2]  Sul.I. Bajwa,et al.  Performance evaluation of an adaptive travel time prediction model , 2005, Proceedings. 2005 IEEE Intelligent Transportation Systems, 2005..

[3]  Alípio Mário Jorge,et al.  Ensemble approaches for regression: A survey , 2012, CSUR.

[4]  João Gama,et al.  Issues in evaluation of stream learning algorithms , 2009, KDD.

[5]  SoaresCarlos,et al.  Ensemble approaches for regression , 2012 .

[6]  Raul Fidalgo-Merino,et al.  Self-Adaptive Induction of Regression Trees , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[8]  Teresa Bernarda Ludermir,et al.  Meta-learning approaches to selecting time series models , 2004, Neurocomputing.

[9]  Ricardo Vilalta,et al.  Metalearning - Applications to Data Mining , 2008, Cognitive Technologies.

[10]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[11]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[12]  Vipin Kumar,et al.  Chapman & Hall/CRC Data Mining and Knowledge Discovery Series , 2008 .

[13]  Abraham Kandel,et al.  Knowledge discovery in data streams with regression tree methods , 2012, WIREs Data Mining Knowl. Discov..

[14]  David H. Wolpert,et al.  No free lunch theorems for optimization , 1997, IEEE Trans. Evol. Comput..

[15]  Rich Caruana,et al.  An empirical comparison of supervised learning algorithms , 2006, ICML.

[16]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[17]  Carlos Soares,et al.  A Meta-Learning Method to Select the Kernel Width in Support Vector Regression , 2004, Machine Learning.

[18]  Ricardo Vilalta,et al.  Introduction to the Special Issue on Meta-Learning , 2004, Machine Learning.

[19]  Baozhen Yao,et al.  Bus Arrival Time Prediction Using Support Vector Machines , 2006, J. Intell. Transp. Syst..

[20]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[21]  Fred Collopy,et al.  Automatic Identification of Time Series Features for Rule-Based Forecasting , 2001 .

[22]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[23]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[24]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[25]  Jesús S. Aguilar-Ruiz,et al.  Knowledge discovery from data streams , 2009, Intell. Data Anal..

[26]  WangXiaozhe,et al.  Rule induction for forecasting method selection , 2009 .

[27]  Juan José Rodríguez Diez,et al.  Combining Online Classification Approaches for Changing Environments , 2008, SSPR/SPR.

[28]  KohaviRon,et al.  An Empirical Comparison of Voting Classification Algorithms , 1999 .

[29]  Bernard Zenko,et al.  Is Combining Classifiers with Stacking Better than Selecting the Best One? , 2004, Machine Learning.

[30]  Eric Bauer,et al.  An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants , 1999, Machine Learning.

[31]  Claude Sammut,et al.  Incremental Learning of Linear Model Trees , 2004, Machine Learning.

[32]  João Gama,et al.  Learning about the Learning Process , 2011, IDA.

[33]  Bogdan Gabrys,et al.  Meta-learning for time series forecasting and forecast combination , 2010, Neurocomputing.

[34]  Nysret Musliu,et al.  Algorithm Selection for the Graph Coloring Problem , 2013, LION.

[35]  J. Freidman,et al.  Multivariate adaptive regression splines , 1991 .

[36]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[37]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Meta-Learning for Periodic Algorithm Selection in Time-Changing Data , 2012, 2012 Brazilian Symposium on Neural Networks.

[38]  César Hervás-Martínez,et al.  Cooperative coevolution of artificial neural network ensembles for pattern classification , 2005, IEEE Transactions on Evolutionary Computation.

[39]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[40]  Claude Sammut,et al.  Extracting Hidden Context , 1998, Machine Learning.

[41]  J. Friedman,et al.  Projection Pursuit Regression , 1981 .

[42]  Shai Ben-David,et al.  Detecting Change in Data Streams , 2004, VLDB.

[43]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[44]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Combining meta-learning and search techniques to select parameters for support vector machines , 2012, Neurocomputing.

[45]  Ernestina Menasalvas Ruiz,et al.  Learning recurring concepts from data streams with a context-aware ensemble , 2011, SAC.

[46]  Xiaozhe Wang,et al.  Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series , 2009, Neurocomputing.

[47]  Katarzyna Musial,et al.  Next challenges for adaptive learning systems , 2012, SKDD.

[48]  T. Evgeniou,et al.  To combine or not to combine: selecting among forecasts and their combinations , 2005 .

[49]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[50]  João Pedro Carvalho Leal Mendes Moreira,et al.  Travel time prediction for the planning of mass transit companies: a machine learning approach , 2008 .

[51]  LastMark Online classification of nonstationary data streams , 2002 .

[52]  Mark Last,et al.  Online classification of nonstationary data streams , 2002, Intell. Data Anal..

[53]  Ian Witten,et al.  Data Mining , 2000 .

[54]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[55]  M. Sorror,et al.  "To combine or not to combine": optimizing risk assessment before allogeneic hematopoietic cell transplantation. , 2014, Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation.

[56]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[57]  Gerhard Widmer,et al.  Tracking Context Changes through Meta-Learning , 1997, Machine Learning.

[58]  Kate Smith-Miles,et al.  Cross-disciplinary perspectives on meta-learning for algorithm selection , 2009, CSUR.

[59]  Felix Naumann,et al.  Data fusion , 2009, CSUR.

[60]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[61]  M. Harries SPLICE-2 Comparative Evaluation: Electricity Pricing , 1999 .

[62]  Shankar Krishnan,et al.  Robustness of Change Detection Algorithms , 2011, IDA.

[63]  Ralf Klinkenberg Meta-Learning, Model Selection, and Example Selection in Machine Learning Domains with Concept Drift , 2005, LWA.