A Framework for Mining Signatures from Event Sequences and Its Applications in Healthcare Data

This paper proposes a novel temporal knowledge representation and learning framework to perform large-scale temporal signature mining of longitudinal heterogeneous event data. The framework enables the representation, extraction, and mining of high-order latent event structure and relationships within single and multiple event sequences. The proposed knowledge representation maps the heterogeneous event sequences to a geometric image by encoding events as a structured spatial-temporal shape process. We present a doubly constrained convolutional sparse coding framework that learns interpretable and shift-invariant latent temporal event signatures. We show how to cope with the sparsity in the data as well as in the latent factor model by inducing a double sparsity constraint on the β-divergence to learn an overcomplete sparse latent factor model. A novel stochastic optimization scheme performs large-scale incremental learning of group-specific temporal event signatures. We validate the framework on synthetic data and on an electronic health record dataset.

[1]  Qiang Yang,et al.  Detect and Track Latent Factors with Online Nonnegative Matrix Factorization , 2007, IJCAI.

[2]  Fei Wang,et al.  Efficient Document Clustering via Online Nonnegative Matrix Factorizations , 2011, SDM.

[3]  J. Eggert,et al.  Sparse coding and NMF , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[4]  Lexing Xie,et al.  Event Mining in Multimedia Streams , 2008, Proceedings of the IEEE.

[5]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[6]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[7]  R. Andrew Russell Mobile Robot Learning by Self-Observation , 2004, Auton. Robots.

[8]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[9]  Ming Dong,et al.  A Tutorial on Nonlinear Time-Series Data Mining in Engineering Asset Health and Reliability Prediction: Concepts, Models, and Algorithms , 2010 .

[10]  Chris H. Q. Ding,et al.  Convex and Semi-Nonnegative Matrix Factorizations , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Jérôme Idier,et al.  Algorithms for nonnegative matrix factorization with the beta-divergence , 2010, ArXiv.

[12]  Jérôme Idier,et al.  Algorithms for Nonnegative Matrix Factorization with the β-Divergence , 2010, Neural Computation.

[13]  Fabian Mörchen,et al.  Algorithms for time series knowledge mining , 2006, KDD '06.

[14]  Paris Smaragdis,et al.  Non-negative Matrix Factor Deconvolution; Extraction of Multiple Sound Sources from Monophonic Inputs , 2004, ICA.

[15]  Barak A. Pearlmutter,et al.  Discovering Convolutive Speech Phones Using Sparseness and Non-negativity , 2007, ICA.

[16]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[17]  W. Katon,et al.  Diabetes complications severity index and risk of mortality, hospitalization, and healthcare utilization. , 2008, The American journal of managed care.

[18]  Fabian Mörchen,et al.  Efficient mining of understandable patterns from multivariate interval time series , 2007, Data Mining and Knowledge Discovery.

[19]  Dmitriy Fradkin,et al.  Robust Mining of Time Intervals with Semi-interval Partial Order Patterns , 2010, SDM.

[20]  Eamonn J. Keogh,et al.  A symbolic representation of time series, with implications for streaming algorithms , 2003, DMKD '03.

[21]  U. Feige,et al.  Spectral Graph Theory , 2015 .

[22]  Jonathon Shlens,et al.  The Structure of Large-Scale Synchronized Firing in Primate Retina , 2009, The Journal of Neuroscience.