Discrete Event, Continuous Time RNNs

We investigate recurrent neural network architectures for event-sequence processing. Event sequences, characterized by discrete observations stamped with continuous-valued times of occurrence, are challenging due to the potentially wide dynamic range of relevant time scales as well as interactions between time scales. We describe four forms of inductive bias that should benefit architectures for event sequences: temporal locality, position and scale homogeneity, and scale interdependence. We extend the popular gated recurrent unit (GRU) architecture to incorporate these biases via intrinsic temporal dynamics, obtaining a continuous-time GRU. The CT-GRU arises by interpreting the gates of a GRU as selecting a time scale of memory, and the CT-GRU generalizes the GRU by incorporating multiple time scales of memory and performing context-dependent selection of time scales for information storage and retrieval. Event time-stamps drive decay dynamics of the CT-GRU, whereas they serve as generic additional inputs to the GRU. Despite the very different manner in which the two models consider time, their performance on eleven data sets we examined is essentially identical. Our surprising results point both to the robustness of GRU and LSTM architectures for handling continuous time, and to the potency of incorporating continuous dynamics into neural architectures.

[1]  Le Song,et al.  Time-Sensitive Recommendation From Recurrent User Activities , 2015, NIPS.

[2]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[3]  Abderrahim Elmoataz,et al.  Multiscale Convolutional Neural Networks for Vision-Based Classification of Cells , 2012, ACCV.

[4]  Alan J Lockett and Risto Miikkulainen Temporal Convolution Machines for Sequence Learning , 2009 .

[5]  Xiangmin Xu,et al.  Multi-scale convolutional neural networks for crowd counting , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[6]  Gang Chen,et al.  Personal recommendation using deep recurrent neural networks in NetEase , 2016, 2016 IEEE 32nd International Conference on Data Engineering (ICDE).

[7]  Yang Song,et al.  Multi-Rate Deep Learning for Temporal Recommendation , 2016, SIGIR.

[8]  Kenji Satou,et al.  DNA Sequence Classification by Convolutional Neural Network , 2016 .

[9]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[10]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[11]  John R. Anderson,et al.  Human memory: An adaptive perspective. , 1989 .

[12]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[13]  Robert V. Lindsey,et al.  Improving Students’ Long-Term Knowledge Retention Through Personalized Review , 2014, Psychological science.

[14]  Mozer,et al.  Early parallel processing in reading: a connectionist approach. Technical report, April-November 1986 , 1986 .

[15]  K. Duraiswamy,et al.  MULTI SCALE TIME SERIES PREDICTION FOR INTRUSION DETECTION , 2014 .

[16]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[17]  D. M. Green,et al.  Signal detection theory and psychophysics , 1966 .

[18]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[19]  Yixin Chen,et al.  Multi-Scale Convolutional Neural Networks for Time Series Classification , 2016, ArXiv.

[20]  Alex Beutel,et al.  Recurrent Recommender Networks , 2017, WSDM.

[21]  Le Song,et al.  Recurrent Coevolutionary Latent Feature Processes for Continuous-Time Recommendation , 2016, DLRS@RecSys.

[22]  Wayne D. Gray,et al.  Forgetting to Remember: The Functional Relationship of Decay and Interference , 2002, Psychological science.

[23]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[24]  Le Song,et al.  Isotonic Hawkes Processes , 2016, ICML.

[25]  Vikas Kumar,et al.  "I like to explore sometimes": Adapting to Dynamic User Novelty Preferences , 2015, RecSys.

[26]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[27]  David K. Gifford,et al.  Convolutional neural network architectures for predicting DNA–protein binding , 2016, Bioinform..

[28]  Shiqiang Yang,et al.  A Multiscale Survival Process for Modeling Human Activity Patterns , 2016, PloS one.

[29]  Kunihiko Fukushima,et al.  Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position , 1980, Biological Cybernetics.

[30]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[31]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[32]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[33]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[34]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[35]  Utkarsh Upadhyay,et al.  Recurrent Marked Temporal Point Processes: Embedding Event History to Vector , 2016, KDD.

[36]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[37]  Yann LeCun,et al.  Convolutional Learning of Spatio-temporal Features , 2010, ECCV.

[38]  Xin Wang,et al.  Recommending Groups to Users Using User-Group Engagement and Time-Dependent Matrix Factorization , 2016, AAAI.

[39]  Todd L. Heberlein,et al.  Network intrusion detection , 1994, IEEE Network.

[40]  Tara N. Sainath,et al.  Deep Convolutional Neural Networks for Large-scale Speech Tasks , 2015, Neural Networks.

[41]  Ed Vul,et al.  Predicting the Optimal Spacing of Study: A Multiscale Context Model of Memory , 2009, NIPS.

[42]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[43]  Alexandros Karatzoglou,et al.  Session-based Recommendations with Recurrent Neural Networks , 2015, ICLR.

[44]  Mingxuan Sun,et al.  A hazard based approach to user return time prediction , 2014, KDD.

[45]  A. Dassios,et al.  Exact Simulation of Hawkes Process with Exponentially Decaying Intensity , 2013 .

[46]  Tao Qin,et al.  Time-Decaying Bandits for Non-stationary Systems , 2014, WINE.