Noise-Contrastive Estimation for Multivariate Point Processes

The log-likelihood of a generative model often involves both positive and negative terms. For a temporal multivariate point process, the negative term sums over all the possible event types at each time and also integrates over all the possible times. As a result, maximum likelihood estimation is expensive. We show how to instead apply a version of noise-contrastive estimation---a general parameter estimation method with a less expensive stochastic objective. Our specific instantiation of this general idea works out in an interestingly non-trivial way and has provable guarantees for its optimality, consistency and efficiency. On several synthetic and real-world datasets, our method shows benefits: for the model to achieve the same level of log-likelihood on held-out data, our method needs considerably fewer function evaluations and less wall-clock time.

[1]  Jure Leskovec,et al.  Motifs in Temporal Networks , 2016, WSDM.

[2]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[3]  Thomas Josef Liniger,et al.  Multivariate Hawkes processes , 2009 .

[4]  Yonghui Wu,et al.  Exploring the Limits of Language Modeling , 2016, ArXiv.

[5]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[6]  G. Shedler,et al.  Simulation of Nonhomogeneous Poisson Processes by Thinning , 1979 .

[7]  A. Hawkes Spectra of some self-exciting and mutually exciting point processes , 1971 .

[8]  Erik D. Demaine,et al.  Optimally Adaptive Integration of Univariate Lipschitz Functions , 2006, Algorithmica.

[9]  T. Ferguson A Course in Large Sample Theory , 1996 .

[10]  Christos Faloutsos,et al.  Edge Weight Prediction in Weighted Signed Networks , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[11]  Ruocheng Guo,et al.  INITIATOR: Noise-contrastive Estimation for Marked Temporal Point Process , 2018, IJCAI.

[12]  Yee Whye Teh,et al.  A fast and simple algorithm for training neural probabilistic language models , 2012, ICML.

[13]  Jure Leskovec,et al.  Governance in Social Media: A Case Study of the Wikipedia Promotion Process , 2010, ICWSM.

[14]  Esko Valkeila,et al.  An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure, 2nd Edition by Daryl J. Daley, David Vere‐Jones , 2008 .

[15]  R. Dahlhaus,et al.  Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions , 2016, 1605.06759.

[16]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[17]  Raymond J. Mooney,et al.  Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.

[18]  Aapo Hyvärinen,et al.  Noise-Contrastive Estimation of Unnormalized Statistical Models, with Applications to Natural Image Statistics , 2012, J. Mach. Learn. Res..

[19]  Jason Eisner,et al.  Neural Datalog Through Time: Informed Temporal Modeling via Logical Specification , 2020, ICML.

[20]  Kathleen M. Carley,et al.  Patterns and dynamics of users' behavior and interaction: Network analysis of an online community , 2009, J. Assoc. Inf. Sci. Technol..

[21]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[22]  Utkarsh Upadhyay,et al.  Recurrent Marked Temporal Point Processes: Embedding Event History to Vector , 2016, KDD.

[23]  Jason Eisner,et al.  The Neural Hawkes Process: A Neurally Self-Modulating Multivariate Point Process , 2016, NIPS.

[24]  Ashish Vaswani,et al.  Decoding with Large-Scale Neural Language Models Improves Translation , 2013, EMNLP.

[25]  Zhuang Ma,et al.  Noise Contrastive Estimation and Negative Sampling for Conditional Models: Consistency and Statistical Efficiency , 2018, EMNLP.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Lawrence Carin,et al.  Online Continuous-Time Tensor Factorization Based on Pairwise Interactive Point Processes , 2018, IJCAI.

[28]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .