VAMBC: A Variational Approach for Mobility Behavior Clustering

Many domains including policymaking, urban design, and geospatial intelligence benefit from understanding people’s mobility behaviors (e.g., work commute, shopping), which can be achieved by clustering massive trajectories using the geo-context around the visiting locations (e.g., sequence of vectors, each describing the geographic environment near a visited location). However, existing clustering approaches on sequential data are not effective for clustering these context sequences based on the contexts’ transition patterns. They either rely on traditional pre-defined similarities for specific application requirements or utilize a two-phase autoencoder-based deep learning process, which is not robust to training variations. Thus, we propose a variational approach named VAMBC for clustering context sequences that simultaneously learns the self-supervision and cluster assignments in a single phase to infer moving behaviors from context transitions in trajectories. Our experiments show that VAMBC significantly outperforms the state-of-the-art approaches in robustness and accuracy of clustering mobility behaviors in trajectories.

[1]  Cyrus Shahabi,et al.  DETECT: Deep Trajectory Clustering for Mobility-Behavior Analysis , 2019, 2019 IEEE International Conference on Big Data (Big Data).

[2]  Jianyong Wang,et al.  A dirichlet multinomial mixture model-based approach for short text clustering , 2014, KDD.

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  Carl Doersch,et al.  Tutorial on Variational Autoencoders , 2016, ArXiv.

[5]  Rui Shu,et al.  A Note on Deep Variational Models for Unsupervised Clustering , 2017 .

[6]  Murray Shanahan,et al.  Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders , 2016, ArXiv.

[7]  Lejian Liao,et al.  Inferring a Personalized Next Point-of-Interest Recommendation Model with Latent Behavior Patterns , 2016, AAAI.

[8]  Donghyeon Park,et al.  Content-Aware Hierarchical Point-of-Interest Embedding Model for Successive POI Recommendation , 2018, IJCAI.

[9]  Riadh Ksantini,et al.  Adversarial Deep Embedded Clustering: On a Better Trade-off Between Feature Randomness and Feature Drift , 2019, IEEE Transactions on Knowledge and Data Engineering.

[10]  Ka Yee Yeung,et al.  Details of the Adjusted Rand index and Clustering algorithms Supplement to the paper “ An empirical study on Principal Component Analysis for clustering gene expression data ” ( to appear in Bioinformatics ) , 2001 .

[11]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[12]  Xing Xie,et al.  Mining interesting locations and travel sequences from GPS trajectories , 2009, WWW '09.

[13]  Pierre Gançarski,et al.  A global averaging method for dynamic time warping, with applications to clustering , 2011, Pattern Recognit..

[14]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[15]  Bo Yang,et al.  Towards K-means-friendly Spaces: Simultaneous Deep Learning and Clustering , 2016, ICML.

[16]  Padhraic Smyth,et al.  Clustering Sequences with Hidden Markov Models , 1996, NIPS.

[17]  Marco Cuturi,et al.  Fast Global Alignment Kernels , 2011, ICML.

[18]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[20]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[21]  Xingpeng Jiang,et al.  Sequence clustering in bioinformatics: an empirical study. , 2018, Briefings in bioinformatics.

[22]  Homa Karimabadi,et al.  Deep Temporal Clustering : Fully Unsupervised Learning of Time-Domain Features , 2018, ArXiv.

[23]  Valero Laparra,et al.  End-to-end Optimized Image Compression , 2016, ICLR.

[24]  Ka Yee Yeung,et al.  Principal component analysis for clustering gene expression data , 2001, Bioinform..

[25]  Jiawei Han,et al.  Locally Consistent Concept Factorization for Document Clustering , 2011, IEEE Transactions on Knowledge and Data Engineering.

[26]  Kamran Paynabar,et al.  Sequence Graph Transform (SGT): A Feature Extraction Function for Sequence Data Mining , 2016, ArXiv.

[27]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2015, SIGMOD Conference.

[28]  Bernhard Schölkopf,et al.  From Variational to Deterministic Autoencoders , 2019, ICLR.

[29]  Jouni Helske,et al.  Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R , 2017, Journal of Statistical Software.

[30]  Jianping Yin,et al.  Improved Deep Embedded Clustering with Local Structure Preservation , 2017, IJCAI.

[31]  Peng Wang,et al.  Self-Taught Convolutional Neural Networks for Short Text Clustering , 2017, Neural Networks.

[32]  Luis Gravano,et al.  k-Shape: Efficient and Accurate Clustering of Time Series , 2016, SGMD.

[33]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[34]  M. Trivedi,et al.  Learning trajectory patterns by clustering: Experimental studies and comparative evaluation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Emilien Dupont,et al.  Joint-VAE: Learning Disentangled Joint Continuous and Discrete Representations , 2018, NeurIPS.

[36]  Yu Zheng,et al.  Trajectory Data Mining , 2015, ACM Trans. Intell. Syst. Technol..