Row-clustering of a Point Process-valued Matrix

Structured point process data harvested from various platforms poses new challenges to the machine learning community. To cluster repeatedly observed marked point processes, we propose a novel mixture model of multi-level marked point processes for identifying potential heterogeneity in the observed data. Specifically, we study a matrix whose entries are marked log-Gaussian Cox processes and cluster rows of such a matrix. An efficient semi-parametric Expectation-Solution (ES) algorithm combined with functional principal component analysis (FPCA) of point processes is proposed for model estimation. The effectiveness of the proposed framework is demonstrated through simulation studies and real data analyses.

[1]  R. Waagepetersen,et al.  Semiparametric Multinomial Logistic Regression for Multivariate Point Pattern Data , 2020, Journal of the American Statistical Association.

[2]  Fan Yin,et al.  Analysis of Professional Basketball Field Goal Attempts via a Bayesian Matrix Clustering Approach , 2020, J. Comput. Graph. Stat..

[3]  Ulrike von Luxburg,et al.  Clustering Stability: An Overview , 2010, Found. Trends Mach. Learn..

[4]  Rui Zhang,et al.  Efficient Non-parametric Bayesian Hawkes Processes , 2018, IJCAI.

[5]  Michael Kearney,et al.  rtweet: Collecting and analyzing Twitter data , 2019, J. Open Source Softw..

[6]  J. Møller,et al.  Log Gaussian Cox Processes , 1998 .

[7]  Thomas A. Lasko,et al.  Efficient Inference of Gaussian-Process-Modulated Renewal Processes with Application to Medical Event Data , 2014, UAI.

[8]  Ming Wang,et al.  Semi-parametric Learning of Structured Temporal Point Processes , 2020, J. Mach. Learn. Res..

[9]  Feng Zhou Efficient Inference for Nonparametric Hawkes Processes Using Auxiliary Latent Variables , 2020 .

[10]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[11]  Hongyuan Zha,et al.  A Dirichlet Mixture Model of Hawkes Processes for Event Sequence Clustering , 2017, NIPS.

[12]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[13]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[14]  Robert Tibshirani,et al.  Cluster Validation by Prediction Strength , 2005 .

[15]  Stefano Carrazza,et al.  VegasFlow: Accelerating Monte Carlo simulation across multiple hardware platforms , 2020, Comput. Phys. Commun..

[16]  David Bolin,et al.  Level set Cox processes , 2017, Spatial Statistics.

[17]  Jie Peng,et al.  Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions , 2008, 0805.0463.

[18]  Jeffrey D. Scargle,et al.  An Introduction to the Theory of Point Processes, Vol. I: Elementary Theory and Methods , 2004, Technometrics.

[19]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[20]  Paul S. Bradley,et al.  Refining Initial Points for K-Means Clustering , 1998, ICML.

[21]  Thorsten Ohl,et al.  Vegas revisited : Adaptive Monte Carlo integration beyond factorization , 1998, hep-ph/9806432.

[22]  Jiayi Wang,et al.  Low-Rank Covariance Function Estimation for Multidimensional Functional Data , 2020, Journal of the American Statistical Association.

[23]  G. Lepage A new algorithm for adaptive multidimensional integration , 1978 .

[24]  Junjie Zhang,et al.  ZMCintegral: A package for multi-dimensional Monte Carlo integration on multi-GPUs , 2019, Comput. Phys. Commun..

[25]  Yongtao Guan,et al.  Nonparametric estimation of the pair correlation function of replicated inhomogeneous point processes , 2020 .

[26]  A. Hawkes,et al.  A cluster process representation of a self-exciting process , 1974, Journal of Applied Probability.

[27]  D. Rubin,et al.  Estimation and Hypothesis Testing in Finite Mixture Models , 1985 .

[28]  Wenjun Zhang,et al.  Multi-Task Multi-Dimensional Hawkes Processes for Modeling Event Sequences , 2015, IJCAI.

[29]  Hans-Georg Müller,et al.  Functional Data Analysis , 2016 .

[30]  George Casella,et al.  EM Algorithm for Estimating Equations , 1998 .

[31]  Norhaiza Ya Abdullah,et al.  Time series analysis of web server logs for an online newspaper , 2013, ICUIMC '13.

[32]  Mark Berman,et al.  Approximating Point Process Likelihoods with Glim , 1992 .

[33]  K. Jarrod Millman,et al.  Array programming with NumPy , 2020, Nat..

[34]  J. Møller,et al.  Statistical Inference and Simulation for Spatial Point Processes , 2003 .

[35]  James O. Ramsay,et al.  Principal components analysis for functional data , 1997 .

[36]  Hongyuan Zha,et al.  Dyadic event attribution in social networks with mixtures of hawkes processes , 2013, CIKM.

[37]  Hongyuan Zha,et al.  Discovering Temporal Patterns for Event Sequence Clustering via Policy Mixture Model , 2020 .

[38]  Chenghu Zhou,et al.  Clustering of temporal event processes , 2013, Int. J. Geogr. Inf. Sci..