A Distributed Online Learning Approach for Pattern Prediction over Movement Event Streams with Apache Flink

In this paper, we present a distributed online prediction system for user-defined patterns over multiple massive streams of movement events, built using the general purpose stream processing framework Apache Flink. The proposed approach is based on combining probabilistic event pattern prediction models on multiple predictor nodes with a distributed online learning protocol in order to continuously learn the parameters of a global prediction model and share them among the predictors in a communicationefficient way. Our approach enables the collaborative learning between the predictors (i.e., "learn from each other"), thus the learning rate is accelerated with less data for each predictor. The underlying model provides online predictions about when a pattern (i.e., a regular expression over the event types) is expected to be completed within each event stream. We describe the distributed architecture of the proposed system, its implementation in Flink, and present experimental results over real-world event streams related to trajectories of moving vessels.

[1]  Feng Yan,et al.  Distributed Autonomous Online Learning: Regrets and Intrinsic Privacy-Preserving Properties , 2010, IEEE Transactions on Knowledge and Data Engineering.

[2]  Ohad Shamir,et al.  Optimal Distributed Online Prediction Using Mini-Batches , 2010, J. Mach. Learn. Res..

[3]  David Luckham,et al.  The power of events - an introduction to complex event processing in distributed enterprise systems , 2002, RuleML.

[4]  Alexander Artikis,et al.  Event Forecasting with Pattern Markov Chains , 2017, DEBS.

[5]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[6]  Hans-Arno Jacobsen,et al.  Predictive publish/subscribe matching , 2010, DEBS '10.

[7]  G. Nuel Pattern Markov chains: optimal Markov chain embedding through deterministic finite automata , 2008 .

[8]  Seif Haridi,et al.  Apache Flink™: Stream and Batch Processing in a Single Engine , 2015, IEEE Data Eng. Bull..

[9]  Michael Kamp,et al.  Communication-Efficient Distributed Online Learning with Kernels , 2016, ECML/PKDD.

[10]  Boris Cule,et al.  A pattern based predictor for event streams , 2015, Expert Syst. Appl..

[11]  Nikos Pelekis,et al.  Online event recognition from moving vessel trajectories , 2016, GeoInformatica.

[12]  Assaf Schuster,et al.  Communication-Efficient Distributed Online Prediction by Dynamic Model Synchronization , 2014, ECML/PKDD.

[13]  Imrich Chlamtac,et al.  Internet of things: Vision, applications and research challenges , 2012, Ad Hoc Networks.

[14]  Heikki Mannila,et al.  Discovery of Frequent Episodes in Event Sequences , 1997, Data Mining and Knowledge Discovery.

[15]  Stan Matwin,et al.  Knowledge-based clustering of ship trajectories using density-based approach , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[16]  Alessandro Margara,et al.  Processing flows of information: From data stream to complex event processing , 2012, CSUR.

[17]  Murat Kulahci,et al.  Introduction to Time Series Analysis and Forecasting , 2008 .

[18]  Nikos Pelekis,et al.  Event Recognition for Maritime Surveillance , 2015, EDBT.

[19]  Ricardo Vilalta,et al.  Predicting rare events in temporal domains , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[20]  K. Simmonds,et al.  The International Maritime Organization , 1994 .

[21]  Lina Fahed,et al.  Efficient Discovery of Episode Rules with a Minimal Antecedent and a Distant Consequent , 2014, IC3K.

[22]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[23]  Ryen W. White,et al.  Stream prediction using a generative model based on frequent episodes in event sequences , 2008, KDD.

[24]  Minos N. Garofalakis,et al.  FERARI: A Prototype for Complex Event Processing over Streaming Multi-cloud Platforms , 2016, SIGMOD Conference.

[25]  Nick Koudas,et al.  TwitterMonitor: trend detection over the twitter stream , 2010, SIGMOD Conference.

[26]  John Langford,et al.  Slow Learners are Fast , 2009, NIPS.

[27]  Michele Vespe,et al.  Vessel Pattern Knowledge Discovery from AIS Data: A Framework for Anomaly Detection and Route Prediction , 2013, Entropy.

[28]  T. W. Anderson,et al.  Statistical Inference about Markov Chains , 1957 .