Universal transformer Hawkes process with adaptive recursive iteration

Asynchronous events sequences are widely distributed in the natural world and human activities, such as earthquakes records, users’ activities in social media and so on. How to distill the information from these seemingly disorganized data is a persistent topic that researchers focus on. The one of the most useful model is the point process model, and on the basis, the researchers obtain many noticeable results. Moreover, in recent years, point process models on the foundation of neural networks, especially recurrent neural networks (RNN) are proposed and compare with the traditional models, their performance are greatly improved. Enlighten by transformer model, which can learning sequence data efficiently without recurrent and convolutional structure, transformer Hawkes process is come out, and achieves state-of-the-art performance. However, there is some research proving that the reintroduction of recursive calculations in transformer can further improve transformer’s performance. Thus, we come out with a new kind of transformer Hawkes process model, universal transformer Hawkes process (UTHP), which contains both recursive mechanism and self-attention mechanism, and to improve the local perception ability of the model, we also introduce convolutional neural network (CNN) in the position-wise-feed-forward part. We conduct experiments on several datasets to validate the effectiveness of UTHP and explore the changes after the introduction of the recursive mechanism. These experiments on multiple datasets demonstrate that the performance of our proposed new model has a certain improvement compared with the previous state-of-the-art models.

[1]  Peter Szolovits,et al.  MIMIC-III, a freely accessible critical care database , 2016, Scientific Data.

[2]  Hongyuan Zha,et al.  Learning Granger Causality for Hawkes Processes , 2016, ICML.

[3]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[4]  Hoon Kim,et al.  Monte Carlo Statistical Methods , 2000, Technometrics.

[5]  Emine Yilmaz,et al.  Self-Attentive Hawkes Process , 2020, ICML.

[6]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[7]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[8]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[9]  Hongyuan Zha,et al.  Modeling the Intensity Function of Point Process Via Recurrent Neural Networks , 2017, AAAI.

[10]  Jian-Wei Liu,et al.  Survival analysis of failures based on Hawkes process with Weibull base intensity , 2020, Eng. Appl. Artif. Intell..

[11]  E. Bacry,et al.  Hawkes Processes in Finance , 2015, 1502.04592.

[12]  Rajeev R. Raje,et al.  Improving social harm indices with a modulated Hawkes process , 2018 .

[13]  Jure Leskovec,et al.  SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity , 2015, KDD.

[14]  Y. Ogata Space-Time Point-Process Models for Earthquake Occurrences , 1998 .

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Lasso and probabilistic inequalities for multivariate point processes , 2015, 1208.0570.

[17]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[20]  A. Hawkes Hawkes processes and their applications to finance: a review , 2018 .

[21]  Hongyuan Zha,et al.  Transformer Hawkes Process , 2020, ICML.

[22]  Le Song,et al.  Learning Social Infectivity in Sparse Low-rank Networks Using Multi-dimensional Hawkes Processes , 2013, AISTATS.

[23]  Jason Eisner,et al.  The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process , 2016, NIPS.

[24]  Arnold Neumaier,et al.  Introduction to Numerical Analysis , 2001 .

[25]  Lu Wang,et al.  Supervised Reinforcement Learning with Recurrent Neural Network for Dynamic Treatment Recommendation , 2018, KDD.

[26]  Jörg Tiedemann,et al.  An Analysis of Encoder Representations in Transformer-Based Machine Translation , 2018, BlackboxNLP@EMNLP.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Shuang Xu,et al.  Speech-Transformer: A No-Recurrence Sequence-to-Sequence Model for Speech Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[30]  Utkarsh Upadhyay,et al.  Recurrent Marked Temporal Point Processes: Embedding Event History to Vector , 2016, KDD.

[31]  P. Reynaud-Bouret,et al.  Adaptive estimation for Hawkes processes; application to genome analysis , 2009, 0903.2919.

[32]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[33]  Gebräuchliche Fertigarzneimittel,et al.  V , 1893, Therapielexikon Neurologie.

[34]  Esko Valkeila,et al.  An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure, 2nd Edition by Daryl J. Daley, David Vere‐Jones , 2008 .

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Alexander J. Smola,et al.  Language Models with Transformers , 2019, ArXiv.

[37]  P. Alam ‘A’ , 2021, Composites Engineering: An A–Z Guide.

[38]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[39]  Renaud Lambiotte,et al.  TiDeH: Time-Dependent Hawkes Process for Predicting Retweet Dynamics , 2016, ICWSM.

[40]  Niao He,et al.  Online Learning for Multivariate Hawkes Processes , 2017, NIPS.