DeepNAG: Deep Non-Adversarial Gesture Generation

Synthetic data generation to improve classification performance (data augmentation) is a well-studied problem. Recently, generative adversarial networks (GAN) have shown superior image data augmentation performance, but their suitability in gesture synthesis has received inadequate attention. Further, GANs prohibitively require simultaneous generator and discriminator network training. We tackle both issues in this work. We first discuss a novel, device-agnostic GAN model for gesture synthesis called DeepGAN. Thereafter, we formulate DeepNAG by introducing a new differentiable loss function based on dynamic time warping and the average Hausdorff distance, which allows us to train DeepGAN's generator without requiring a discriminator. Through evaluations, we compare the utility of DeepGAN and DeepNAG against two alternative techniques for training five recognizers using data augmentation over six datasets. We further investigate the perceived quality of synthesized samples via an Amazon Mechanical Turk user study based on the HYPE benchmark. We find that DeepNAG outperforms DeepGAN in accuracy, training time (up to 17x faster), and realism, thereby opening the door to a new line of research in generator network design and training for gesture synthesis. Our source code is available at this https URL.

[1]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[2]  Eamonn J. Keogh,et al.  Searching and Mining Trillions of Time Series Subsequences under Dynamic Time Warping , 2012, KDD.

[3]  Chang-Tsun Li,et al.  Developing a pattern discovery method in time series data and its GPU acceleration , 2018, Big Data Min. Anal..

[4]  Joseph J. LaViola,et al.  A Rapid Prototyping Approach to Synthetic Data Generation for Improved 2D Gesture Recognition , 2016, UIST.

[5]  Whoi-Yul Kim,et al.  Skeleton-Based Dynamic Hand Gesture Recognition Using a Part-Based GRU-RNN for Gesture-Based Interface , 2020, IEEE Access.

[6]  Stephanie Ludi,et al.  Using Off-Line Features and Synthetic Data for On-Line Handwritten Math Symbol Recognition , 2014, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[7]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Taesung Park,et al.  CyCADA: Cycle-Consistent Adversarial Domain Adaptation , 2017, ICML.

[9]  Luis Perez,et al.  The Effectiveness of Data Augmentation in Image Classification using Deep Learning , 2017, ArXiv.

[10]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[11]  Joseph J. LaViola,et al.  Jackknife: A Reliable Recognizer with Few Samples and Many Modalities , 2017, CHI.

[12]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[13]  Léon Bottou,et al.  Wasserstein Generative Adversarial Networks , 2017, ICML.

[14]  Yoshua Bengio,et al.  Drawing and Recognizing Chinese Characters with Recurrent Neural Network , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  R. Plamondon,et al.  A multi-level representation paradigm for handwriting stroke generation. , 2006, Human movement science.

[16]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[17]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[18]  Marco Cuturi,et al.  Soft-DTW: a Differentiable Loss Function for Time-Series , 2017, ICML.

[19]  Dana H. Ballard,et al.  Modular Learning in Neural Networks , 1987, AAAI.

[20]  Lantao Yu,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[21]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[22]  Anil K. Jain,et al.  A modified Hausdorff distance for object matching , 1994, Proceedings of 12th International Conference on Pattern Recognition.

[23]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[24]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[25]  Aaron C. Courville,et al.  Improved Training of Wasserstein GANs , 2017, NIPS.

[26]  Alexei A. Efros,et al.  Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[27]  Gunnar Rätsch,et al.  Real-valued (Medical) Time Series Generation with Recurrent Conditional GANs , 2017, ArXiv.

[28]  Joseph J. LaViola,et al.  Penny pincher: a blazing fast, highly accurate $-family recognizer , 2015, Graphics Interface.

[29]  Yang Li,et al.  Gestures without libraries, toolkits or training: a $1 recognizer for user interface prototypes , 2007, UIST.

[30]  Hwan-Gue Cho,et al.  A New Synthesizing Method for Handwriting Korean Scripts , 1998, Int. J. Pattern Recognit. Artif. Intell..

[31]  Horst Bunke Template-based Synthetic Handwriting Generation for the Training of Recognition Systems , 2005 .

[32]  Luis A. Leiva Large-Scale User Perception of Synthetic Stroke Gestures , 2017, Conference on Designing Interactive Systems.

[33]  Kevin Lin,et al.  Adversarial Ranking for Language Generation , 2017, NIPS.

[34]  Yong Yu,et al.  Long Text Generation via Adversarial Training with Leaked Information , 2017, AAAI.

[35]  Dimitris Samaras,et al.  Two-person interaction detection using body-pose features and multiple instance learning , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[36]  Harshad Rai,et al.  Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks , 2018 .

[37]  Andrea Giachetti,et al.  Online Gesture Recognition , 2019, 3DOR@Eurographics.

[38]  Nicu Sebe,et al.  GestureGAN for Hand Gesture-to-Gesture Translation in the Wild , 2018, ACM Multimedia.

[39]  Wanqing Li,et al.  Action recognition based on a bag of 3D points , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[40]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[41]  Michael S. Bernstein,et al.  HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models , 2019, NeurIPS.

[42]  Samy Bengio,et al.  Generating Sentences from a Continuous Space , 2015, CoNLL.

[43]  Honglak Lee,et al.  Learning Structured Output Representation using Deep Conditional Generative Models , 2015, NIPS.

[44]  Luis A. Leiva,et al.  Gestures à Go Go , 2015, ACM Trans. Intell. Syst. Technol..

[45]  Muriel Visani,et al.  Generation of learning samples for historical handwriting recognition using image degradation , 2013, HIP '13.

[46]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[47]  Douglas Eck,et al.  A Neural Representation of Sketch Drawings , 2017, ICLR.

[48]  Edward J. Delp,et al.  Locating Objects Without Bounding Boxes , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Luis A. Leiva,et al.  The Kinematic Theory Produces Human-Like Stroke Gestures , 2017, Interact. Comput..

[50]  Zhe Wang,et al.  Pose Guided Human Video Generation , 2018, ECCV.

[51]  Jing Xiao,et al.  Adversarial Discrete Sequence Generation without Explicit NeuralNetworks as Discriminators , 2019, AISTATS.

[52]  Dean Rubine,et al.  Specifying gestures by example , 1991, SIGGRAPH.

[53]  Jake K. Aggarwal,et al.  View invariant human action recognition using histograms of 3D joints , 2012, 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[54]  Joseph J. LaViola,et al.  DeepGRU: Deep Gesture Recognition Utility , 2018, ISVC.