Unsupervised Magnification of Posture Deviations Across Subjects

Analyzing human posture and precisely comparing it across different subjects is essential for accurate understanding of behavior and numerous vision applications such as medical diagnostics, sports, or surveillance. Motion magnification techniques help to see even small deviations in posture that are invisible to the naked eye. However, they fail when comparing subtle posture differences across individuals with diverse appearance. Keypoint-based posture estimation and classification techniques can handle large variations in appearance, but are invariant to subtle deviations in posture. We present an approach to unsupervised magnification of posture differences across individuals despite large deviations in appearance. We do not require keypoint annotation and visualize deviations on a sub-bodypart level. To transfer appearance across subjects onto a magnified posture, we propose a novel loss for disentangling appearance and posture in an autoencoder. Posture magnification yields exaggerated images that are different from the training set. Therefore, we incorporate magnification already into the training of the disentangled autoencoder and learn on real data and synthesized magnifications without supervision. Experiments confirm that our approach improves upon the state-of-the-art in magnification and on the application of discovering posture deviations due to impairment.

[1]  M. N. Doja,et al.  Detecting distraction of drivers using Convolutional Neural Network , 2020, Pattern Recognit. Lett..

[2]  Frans Coenen,et al.  Driving Posture Recognition by Joint Application of Motion History Image and Pyramid Histogram of Oriented Gradients , 2013 .

[3]  Silvia L. Pintea,et al.  Video Acceleration Magnification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nicu Sebe,et al.  Self-Adaptive Matrix Completion for Heart Rate Estimation from Face Videos under Realistic Conditions , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Frédo Durand,et al.  Learning-based Video Motion Magnification , 2018, ECCV.

[6]  Jan Kautz,et al.  Multimodal Unsupervised Image-to-Image Translation , 2018, ECCV.

[7]  Frédo Durand,et al.  Deviation magnification , 2015, ACM Trans. Graph..

[8]  Björn Ommer,et al.  A Variational U-Net for Conditional Appearance and Shape Generation , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Björn Ommer,et al.  Deep unsupervised learning of visual similarities , 2018, Pattern Recognit..

[10]  Maneesh Kumar Singh,et al.  DRIT++: Diverse Image-to-Image Translation via Disentangled Representations , 2019, International Journal of Computer Vision.

[11]  Frédo Durand,et al.  Phase-based video motion processing , 2013, ACM Trans. Graph..

[12]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Frédo Durand,et al.  Motion magnification , 2005, ACM Trans. Graph..

[14]  Björn Ommer,et al.  LSTM Self-Supervision for Detailed Behavior Analysis , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Ahmed M. Elgammal,et al.  Separating style and content on a nonlinear manifold , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[16]  Björn Ommer,et al.  Improving Spatiotemporal Self-Supervision by Deep Reinforcement Learning , 2018, ECCV.

[17]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Björn Ommer,et al.  A Style-Aware Content Loss for Real-time HD Style Transfer , 2018, ECCV.

[19]  Bernt Schiele,et al.  DeeperCut: A Deeper, Stronger, and Faster Multi-person Pose Estimation Model , 2016, ECCV.

[20]  Björn Ommer,et al.  Deep Unsupervised Similarity Learning Using Partially Ordered Sets , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Zoubin Ghahramani,et al.  Factorial Learning and the EM Algorithm , 1994, NIPS.

[22]  Frédo Durand,et al.  Riesz pyramids for fast phase-based video magnification , 2014, 2014 IEEE International Conference on Computational Photography (ICCP).

[23]  Li Fei-Fei,et al.  Perceptual Losses for Real-Time Style Transfer and Super-Resolution , 2016, ECCV.

[24]  Sepp Hochreiter,et al.  GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium , 2017, NIPS.

[25]  Matthias Zwicker,et al.  Disentangling Factors of Variation by Mixing Them , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[26]  Davide Modolo,et al.  Action Recognition With Spatial-Temporal Discriminative Filter Banks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[27]  Ming Yang,et al.  3D Convolutional Neural Networks for Human Action Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  William T. Freeman,et al.  Revealing and modifying non-local variations in a single image , 2015, ACM Trans. Graph..

[29]  Björn Ommer,et al.  Unsupervised Part-Based Disentangling of Object Shape and Appearance , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[31]  Rainer Stiefelhagen,et al.  End-to-end Prediction of Driver Intention using 3D Convolutional Neural Networks , 2019, 2019 IEEE Intelligent Vehicles Symposium (IV).

[32]  Rangachar Kasturi,et al.  A Robust Algorithm for Text String Separation from Mixed Text/Graphics Images , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[33]  Lior Wolf,et al.  A Two-Step Disentanglement Method , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[34]  Ilya Kostrikov,et al.  An Efficient Convolutional Network for Human Pose Estimation , 2016, BMVC.

[35]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[36]  Vighnesh Birodkar,et al.  Unsupervised Learning of Disentangled Representations from Video , 2017, NIPS.

[37]  Alessandro Perina,et al.  Angry Crowds: Detecting Violent Events in Videos , 2016, ECCV.

[38]  Frédo Durand,et al.  Synthesizing Images of Humans in Unseen Poses , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[39]  B. Ommer,et al.  Optogenetically stimulating intact rat corticospinal tract post-stroke restores motor control through regionalized functional circuit formation , 2017, Nature Communications.

[40]  Lihi Zelnik-Manor,et al.  Modifying Non-local Variations Across Multiple Views , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[41]  Bodo Rosenhahn,et al.  Multiple People Tracking Using Body and Joint Detections , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[42]  Maneesh Kumar Singh,et al.  Disentangling Factors of Variation with Cycle-Consistent Variational Auto-Encoders , 2018, ECCV.

[43]  Björn Ommer,et al.  CliqueCNN: Deep Unsupervised Exemplar Learning , 2016, NIPS.

[44]  Björn Ommer,et al.  Spatiotemporal Parsing of Motor Kinematics for Assessing Stroke Recovery , 2015, MICCAI.

[45]  Wongun Choi,et al.  Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Kevin M. Cury,et al.  DeepLabCut: markerless pose estimation of user-defined body parts with deep learning , 2018, Nature Neuroscience.

[47]  Mubarak Shah,et al.  Person-on-person violence detection in video data , 2002, Object recognition supported by user interaction for service robots.

[48]  Matthias Zwicker,et al.  Challenges in Disentangling Independent Factors of Variation , 2017, ICLR.

[49]  Frédo Durand,et al.  Video magnification in presence of large motions , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Rainer Stiefelhagen,et al.  Drive&Act: A Multi-Modal Dataset for Fine-Grained Driver Behavior Recognition in Autonomous Vehicles , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Frédo Durand,et al.  Eulerian video magnification for revealing subtle changes in the world , 2012, ACM Trans. Graph..

[52]  Nicu Sebe,et al.  Learning Deep Representations of Appearance and Motion for Anomalous Event Detection , 2015, BMVC.

[53]  Yann LeCun,et al.  Disentangling factors of variation in deep representation using adversarial training , 2016, NIPS.

[54]  Jitendra Malik,et al.  SlowFast Networks for Video Recognition , 2018, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[55]  Mubarak Shah,et al.  Real-World Anomaly Detection in Surveillance Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[56]  Björn Ommer,et al.  Less Is More: Video Trimming for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[57]  Kristin Branson,et al.  JAABA: interactive machine learning for automatic annotation of animal behavior , 2013, Nature Methods.