Inference, Prediction, & Entropy-Rate Estimation of Continuous-Time, Discrete-Event Processes

Inferring models, predicting the future, and estimating the entropy rate of discrete-time, discrete-event processes is well-worn ground. However, a much broader class of discrete-event processes operates in continuous-time. Here, we provide new methods for inferring, predicting, and estimating them. The methods rely on an extension of Bayesian structural inference that takes advantage of neural network’s universal approximation power. Based on experiments with complex synthetic data, the methods are competitive with the state-of-the-art for prediction and entropy-rate estimation.

[1]  J. Vallejo,et al.  Predictability , 2020, Just Words.

[2]  Alexander J. Smola,et al.  FastPoint: Scalable Deep Point Processes , 2019, ECML/PKDD.

[3]  Marcus A. Brubaker,et al.  Normalizing Flows: Introduction and Ideas , 2019, ArXiv.

[4]  Juan-Pablo Ortega,et al.  Echo state networks are universal , 2018, Neural Networks.

[5]  Sarah E. Marzen,et al.  Structure and Randomness of Continuous-Time, Discrete-Event Processes , 2017, Journal of Statistical Physics.

[6]  James P. Crutchfield,et al.  Structure and Randomness of Continuous-Time, Discrete-Event Processes , 2017, ArXiv.

[7]  Isabel Valera,et al.  Modeling the Dynamics of Learning Activity on the Web , 2017, WWW.

[8]  Jason Eisner,et al.  The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process , 2016, NIPS.

[9]  Jascha Sohl-Dickstein,et al.  Capacity and Trainability in Recurrent Neural Networks , 2016, ICLR.

[10]  James P. Crutchfield,et al.  Informational and Causal Architecture of Continuous-time Renewal Processes , 2016, 1611.01099.

[11]  Utkarsh Upadhyay,et al.  Recurrent Marked Temporal Point Processes: Embedding Event History to Vector , 2016, KDD.

[12]  Le Song,et al.  Smart Broadcasting: Do You Want to be Seen? , 2016, KDD.

[13]  Joshua W. Shaevitz,et al.  Predictability and hierarchy in Drosophila behavior , 2016, Proceedings of the National Academy of Sciences.

[14]  Kristian Kersting,et al.  Learning Continuous-Time Bayesian Networks in Relational Domains: A Non-Parametric Approach , 2016, AAAI.

[15]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[16]  Thierry Mora,et al.  Dynamical maximum entropy approach to flocking. , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  James P. Crutchfield,et al.  Bayesian Structural Inference for Hidden Processes , 2013, Physical review. E, Statistical, nonlinear, and soft matter physics.

[18]  Jonathan W. Pillow,et al.  Bayesian entropy estimation for countable discrete distributions , 2013, J. Mach. Learn. Res..

[19]  Daphne Koller,et al.  Continuous Time Bayesian Networks , 2012, UAI.

[20]  James P. Crutchfield,et al.  Anatomy of a Bit: Information in a Time Series Observation , 2011, Chaos.

[21]  Bernhard Schölkopf,et al.  Uncovering the Temporal Dynamics of Diffusion Networks , 2011, ICML.

[22]  David Pfau,et al.  Probabilistic Deterministic Infinite Automata , 2010, NIPS.

[23]  James P. Crutchfield,et al.  Enumerating Finitary Processes , 2010, ArXiv.

[24]  Wolfgang Löhr,et al.  Models of Discrete-Time Stochastic Processes and Associated Complexity Measures , 2009 .

[25]  Edmondo Trentin,et al.  Unsupervised nonparametric density estimation: A neural network approach , 2009, 2009 International Joint Conference on Neural Networks.

[26]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[27]  Alex Bateman,et al.  An introduction to hidden Markov models. , 2007, Current protocols in bioinformatics.

[28]  Nir Friedman,et al.  Continuous Time Markov Networks , 2006, UAI.

[29]  Vladimir B. Balakirsky,et al.  On the entropy rate of a hidden Markov model , 2004, International Symposium onInformation Theory, 2004. ISIT 2004. Proceedings..

[30]  A. Kraskov,et al.  Estimating mutual information. , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  J. Victor Binless strategies for estimation of information from neural data. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[32]  Madalena Costa,et al.  Multiscale entropy analysis of complex physiologic time series. , 2002, Physical review letters.

[33]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[34]  Hans-Andrea Loeliger,et al.  On the information rate of binary-input channels with memory , 2001, ICC 2001. IEEE International Conference on Communications. Conference Record (Cat. No.01CH37240).

[35]  Naftali Tishby,et al.  Complexity through nonextensivity , 2001, physics/0103076.

[36]  J. Crutchfield,et al.  Regularities unseen, randomness observed: levels of entropy convergence. , 2001, Chaos.

[37]  W. Bialek,et al.  Entropy and Inference, Revisited , 2001, NIPS.

[38]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[39]  Naftali Tishby,et al.  Predictability, Complexity, and Learning , 2000, Neural Computation.

[40]  J. Crutchfield,et al.  Computational Mechanics: Pattern and Prediction, Structure and Simplicity , 1999, ArXiv.

[41]  Amir F. Atiya,et al.  Neural Networks for Density Estimation , 1998, NIPS.

[42]  Robert J. Geller,et al.  Earthquake prediction: a critical review , 1997 .

[43]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[44]  William Bialek,et al.  Spikes: Exploring the Neural Code , 1996 .

[45]  Shouhong Wang,et al.  A neural network method of density estimation for univariate unimodal data , 1994, Neural Computing & Applications.

[46]  P. Gaspard,et al.  Noise, chaos, and (ε,τ)-entropy per unit time , 1993 .

[47]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[48]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[49]  James Stephen Marron,et al.  A Comparison of Cross-Validation Techniques in Density Estimation , 1987 .

[50]  Larry D. Hostetler,et al.  Optimization of k nearest neighbor density estimates , 1973, IEEE Trans. Inf. Theory.

[51]  J. Proudfoot,et al.  Noise , 1931, The Indian medical gazette.

[52]  James P. Crutchfield,et al.  Equivalence of History and Generator-Machines , 2012 .

[53]  J.,et al.  Earthquake prediction: a critical review , 1997 .

[54]  A. U.S.,et al.  Predictability , Complexity , and Learning , 2002 .

[55]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[56]  L. Rabiner,et al.  An introduction to hidden Markov models , 1986, IEEE ASSP Magazine.