SenseGen: A deep learning architecture for synthetic sensor data generation

Our ability to synthesize sensory data that preserves specific statistical properties of the real data has had tremendous implications on data privacy and big data analytics. The synthetic data can be used as a substitute for selective real data segments - that are sensitive to the user - thus protecting privacy and resulting in improved analytics. However, increasingly adversarial roles taken by data recipients such as mobile apps, or other cloud-based analytics services, mandate that the synthetic data, in addition to preserving statistical properties, should also be “difficult” to distinguish from the real data. Typically, visual inspection has been used as a test to distinguish between datasets. But more recently, sophisticated classifier models (discriminators), corresponding to a set of events, have also been employed to distinguish between synthesized and real data. The model operates on both datasets and the respective event outputs are compared for consistency. Prior work on data synthesis have often focussed on classifiers that are built for features explicitly preserved by the synthetic data. This suggests that an adversary can build classifiers that can exploit a potentially disjoint set of features for differentiating between the two datasets. In this paper, we take a step towards generating sensory data that can pass a deep learning based discriminator model test, and make two specific contributions: first, we present a deep learning based architecture for synthesizing sensory data. This architecture comprises of a generator model, which is a stack of multiple Long-Short-Term-Memory (LSTM) networks and a Mixture Density Network (MDN); second, we use another LSTM network based discriminator model for distinguishing between the true and the synthesized data. Using a dataset of accelerometer traces, collected using smartphones of users doing their daily activities, we show that the deep learning based discriminator model can only distinguish between the real and synthesized traces with an accuracy in the neighborhood of 50%.

[1]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[3]  Moustafa Youssef,et al.  No need to war-drive: unsupervised indoor localization , 2012, MobiSys '12.

[4]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[5]  S. Srihari Mixture Density Networks , 1994 .

[6]  Bernt Schiele,et al.  Generative Adversarial Text to Image Synthesis , 2016, ICML.

[7]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[8]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[9]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[10]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[11]  Paul J. Werbos,et al.  Backpropagation Through Time: What It Does and How to Do It , 1990, Proc. IEEE.

[12]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[13]  Ian J. Goodfellow,et al.  NIPS 2016 Tutorial: Generative Adversarial Networks , 2016, ArXiv.

[14]  Davide Anguita,et al.  Human Activity Recognition on Smartphones Using a Multiclass Hardware-Friendly Support Vector Machine , 2012, IWAAL.

[15]  Moustafa Youssef,et al.  CrowdInside: automatic construction of indoor floorplans , 2012, SIGSPATIAL/GIS.

[16]  Yoshua Bengio,et al.  Gated Feedback Recurrent Neural Networks , 2015, ICML.

[17]  J. Schwartz,et al.  Abstract MP11: Fitbit: An Accurate and Reliable Device for Wireless Physical Activity Tracking , 2015 .

[18]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.