Multistream regression with asynchronous concept drift detection

A recently introduced problem setting, referred as multistream, involves two independent non-stationary data generating processes. One of them is called source stream, which generates continuous data instances with true output. And the other one called target stream, which generates data instances lacking of true output. Due to the nature of data streams, scholars have addressed prediction problems under scenarios such as covariate shift or concept drift in past studies by discussing one assumption while keeping others consistent. For example, it is assumed that the data distributions of training and testing data are similar, and true output values of the stream instances would be available soon. However, in practice these assumptions are not always valid. The multistream regression problem is to predict the output of target stream, using data instances and their true output from source stream. In this paper, we propose an approach of multistream regression by incorporating concept drift detection into covariate shift adaptation. Meanwhile, empirical evaluation on synthetic and real world datasets demonstrates the effectiveness of the proposed technique by competing with the state-of-the-art approaches. Experiment results indicate that our method significantly improved prediction performance compared to existing benchmark.

[1]  Maurice Herlihy,et al.  A methodology for implementing highly concurrent data objects , 1993, TOPL.

[2]  Larry L. Peterson,et al.  Reasoning about naming systems , 1993, TOPL.

[3]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[4]  João Gama,et al.  Learning with Drift Detection , 2004, SBIA.

[5]  J. Braams Babel, a multilingual style-option system for use with LaTeX's standard document stylesDuring the development ideas from Nico Poppelier, Piet van Oostrum and many others have been used. , 2006 .

[6]  Bernhard Schölkopf,et al.  Correcting Sample Selection Bias by Unlabeled Data , 2006, NIPS.

[7]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[8]  Koby Crammer,et al.  A theory of learning from different domains , 2010, Machine Learning.

[9]  Takafumi Kanamori,et al.  A Least-squares Approach to Direct Importance Estimation , 2009, J. Mach. Learn. Res..

[10]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[11]  Le Gruenwald,et al.  Research issues in mining multiple data streams , 2010, StreamKDD '10.

[12]  Murat Dundar,et al.  Bayesian Nonexhaustive Learning for Online Discovery and Modeling of Emerging Classes , 2012, ICML.

[13]  Masashi Sugiyama,et al.  Sequential change‐point detection based on direct density‐ratio estimation , 2012, Stat. Anal. Data Min..

[14]  Hadi Fanaee-T,et al.  Event labeling combining ensemble detectors and background knowledge , 2014, Progress in Artificial Intelligence.

[15]  João Gama,et al.  A survey on concept drift adaptation , 2014, ACM Comput. Surv..

[16]  Eyke Hüllermeier,et al.  Open challenges for data stream mining research , 2014, SKDD.

[17]  Latifur Khan,et al.  Detecting and Tracking Concept Class Drift and Emergence in Non-Stationary Fast Data Streams , 2015, AAAI.

[18]  Xuan Liang,et al.  Assessing Beijing's PM2.5 pollution: severity, weather impact, APEC and winter heating , 2015, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[19]  M. Herold,et al.  Monitoring forest cover loss using multiple data streams, a case study of a tropical dry forest in Bolivia , 2015 .

[20]  Murat Dundar,et al.  Bayesian Non-Exhaustive Classification A Case Study: Online Name Disambiguation using Temporal Record Streams , 2016, CIKM.

[21]  Charu C. Aggarwal,et al.  An Adaptive Framework for Multistream Classification , 2016, CIKM.

[22]  Latifur Khan,et al.  SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream , 2016, AAAI.

[23]  Brian D. Ziebart,et al.  Robust Covariate Shift Regression , 2016, AISTATS.

[24]  N. Murata,et al.  Time-Varying Transition Probability Matrix Estimation and Its Application to Brand Share Analysis , 2017, PloS one.