Recurrent Marked Temporal Point Processes: Embedding Event History to Vector

Large volumes of event data are becoming increasingly available in a wide variety of applications, such as healthcare analytics, smart cities and social network analysis. The precise time interval or the exact distance between two events carries a great deal of information about the dynamics of the underlying systems. These characteristics make such data fundamentally different from independently and identically distributed data and time-series data where time and space are treated as indexes rather than random variables. Marked temporal point processes are the mathematical framework for modeling event data with covariates. However, typical point process models often make strong assumptions about the generative processes of the event data, which may or may not reflect the reality, and the specifically fixed parametric assumptions also have restricted the expressive power of the respective processes. Can we obtain a more expressive model of marked temporal point processes? How can we learn such a model from massive data? In this paper, we propose the Recurrent Marked Temporal Point Process (RMTPP) to simultaneously model the event timings and the markers. The key idea of our approach is to view the intensity function of a temporal point process as a nonlinear function of the history, and use a recurrent neural network to automatically learn a representation of influences from the event history. We develop an efficient stochastic gradient algorithm for learning the model parameters which can readily scale up to millions of events. Using both synthetic and real world datasets, we show that, in the case where the true models have parametric specifications, RMTPP can learn the dynamics of such models without the need to know the actual parametric forms; and in the case where the true models are unknown, RMTPP can also learn the dynamics and achieve better predictive performance than other parametric alternatives based on particular prior assumptions.

[1]  A. Hawkes Point Spectra of Some Mutually Exciting Point Processes , 1971 .

[2]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[3]  A. Hawkes,et al.  A cluster process representation of a self-exciting process , 1974, Journal of Applied Probability.

[4]  Miss A.O. Penney (b) , 1974, The New Yale Book of Quotations.

[5]  J. Grandell Doubly stochastic Poisson processes , 1976 .

[6]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[7]  Eduardo Sontag,et al.  Turing computability with neural nets , 1991 .

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Jeffrey R. Russell,et al.  Autoregressive Conditional Duration: A New Model for Irregularly Spaced Transaction Data , 1998 .

[10]  Y. Ogata Space-Time Point-Process Models for Earthquake Occurrences , 1998 .

[11]  Daryl J. Daley,et al.  Introduction to the General Theory of Point Processes , 1998 .

[12]  William H. Press,et al.  Numerical recipes in C , 2002 .

[13]  Min Han,et al.  Prediction of chaotic time series based on the recurrent predictor neural network , 2004, IEEE Transactions on Signal Processing.

[14]  Ran El-Yaniv,et al.  On Prediction Using Variable Order Markov Models , 2004, J. Artif. Intell. Res..

[15]  J. Rasmussen,et al.  Perfect simulation of Hawkes processes , 2005, Advances in Applied Probability.

[16]  Nian Zhang,et al.  Time series prediction with recurrent neural networks trained by a hybrid PSO-EA algorithm , 2004, Neurocomputing.

[17]  O. Aalen,et al.  Survival and Event History Analysis: A Process Point of View , 2008 .

[18]  Adilson E. Motter,et al.  A Poissonian explanation for heavy tails in e-mail communication , 2008, Proceedings of the National Academy of Sciences.

[19]  Esko Valkeila,et al.  An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure, 2nd Edition by Daryl J. Daley, David Vere‐Jones , 2008 .

[20]  J. Schmidhuber,et al.  A Novel Connectionist System for Unconstrained Handwriting Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  M. R. Leadbetter Poisson Processes , 2011, International Encyclopedia of Statistical Science.

[22]  N. Limnios,et al.  Semi-Markov Models and Applications , 2011 .

[23]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[24]  Mengjie Zhang,et al.  Cooperative coevolution of Elman recurrent neural networks for chaotic time series prediction , 2012, Neurocomputing.

[25]  Le Song,et al.  Learning Networks of Heterogeneous Influence , 2012, NIPS.

[26]  Scott Grant,et al.  Encouraging user behaviour with achievements: An empirical study , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[27]  Le Song,et al.  Scalable Influence Estimation in Continuous-Time Diffusion Networks , 2013, NIPS.

[28]  Le Song,et al.  Uncover Topic-Sensitive Information Diffusion Networks , 2013, AISTATS.

[29]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[30]  E. Bacry,et al.  Estimation of slowly decreasing Hawkes kernels: Application to high frequency order book modelling , 2014, 1412.7096.

[31]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[32]  Le Song,et al.  Shaping Social Activity by Incentivizing Users , 2014, NIPS.

[33]  E. Bacry,et al.  Market Impacts and the Life Cycle of Investors Orders , 2014, SSRN Electronic Journal.

[34]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[35]  George E. Tita,et al.  Gang rivalry dynamics via coupled point process networks , 2014 .

[36]  Le Song,et al.  Time-Sensitive Recommendation From Recurrent User Activities , 2015, NIPS.

[37]  Le Song,et al.  Dirichlet-Hawkes Processes with Applications to Clustering Continuous-Time Document Streams , 2015, KDD.

[38]  Jure Leskovec,et al.  SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity , 2015, KDD.

[39]  Christos Faloutsos,et al.  RSC: Mining and Modeling Temporal Activity in Social Media , 2015, KDD.

[40]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  James R. Foulds,et al.  HawkesTopic: A Joint Model for Network Inference and Topic Modeling from Text-Based Cascades , 2015, ICML.

[42]  Hongyuan Zha,et al.  Back to the Past: Source Identification in Diffusion Networks from Partially Observed Cascades , 2015, AISTATS.

[43]  Ulrike Goldschmidt,et al.  An Introduction To The Theory Of Point Processes , 2016 .

[44]  E. Bacry,et al.  Estimation of slowly decreasing Hawkes kernels: application to high-frequency order book dynamics , 2016 .