Improved Generalization of Heading Direction Estimation for Aerial Filming Using Semi-Supervised Regression

In the task of Autonomous aerial filming of a moving actor (e.g. a person or a vehicle), it is crucial to have a good heading direction estimation for the actor from the visual input. However, the models obtained in other similar tasks, such as pedestrian collision risk analysis and human-robot interaction, are very difficult to generalize to the aerial filming task, because of the difference in data distributions. Towards improving generalization with less amount of labeled data, this paper presents a semi-supervised algorithm for heading direction estimation problem. We utilize temporal continuity as the unsupervised signal to regularize the model and achieve better generalization ability. This semi-supervised algorithm is applied to both training and testing phases, which increases the testing performance by a large margin. We show that by leveraging unlabeled sequences, the amount of labeled data required can be significantly reduced. We also discuss several important details on improving the performance by balancing labeled and unlabeled loss, and making good combinations. Experimental results show that our approach robustly outputs the heading direction for different types of actor. The aesthetic value of the video is also improved in the aerial filming task.

[1]  Kil To Chong,et al.  Design of an EKF-CI based sensor fusion for robust heading estimation of marine vehicle , 2015 .

[2]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[3]  Kai Yang,et al.  Estimation of the vehicle-pedestrian encounter/conflict risk on the road based on TASI 110-car naturalistic driving data collection , 2014, 2014 IEEE Intelligent Vehicles Symposium Proceedings.

[4]  Jason Weston,et al.  Deep learning via semi-supervised embedding , 2008, ICML '08.

[5]  Sheng Tang,et al.  Accurate Estimation of Human Body Orientation From RGB-D Sensors , 2013, IEEE Transactions on Cybernetics.

[6]  Francesco Solera,et al.  Performance Measures and a Data Set for Multi-target, Multi-camera Tracking , 2016, ECCV Workshops.

[7]  Hossein Mobahi,et al.  Deep learning from temporal coherence in video , 2009, ICML '09.

[8]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Wu Liu,et al.  Weighted sequence loss based spatial-temporal deep learning framework for human body orientation estimation , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[10]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jonathan Tompson,et al.  Unsupervised Learning of Spatiotemporally Coherent Metrics , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[12]  Byoung-Tak Zhang,et al.  Human Body Orientation Estimation using Convolutional Neural Network , 2016, ArXiv.

[13]  Scott E. Hudson,et al.  Parallel detection of conversational groups of free-standing people and tracking of their lower-body orientation , 2015, 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).

[14]  Sebastian Thrun,et al.  Unsupervised learning of invariant features using video , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Andreas Geiger,et al.  Vision meets robotics: The KITTI dataset , 2013, Int. J. Robotics Res..

[16]  Martial Hebert,et al.  Activity Forecasting , 2012, ECCV.

[17]  Tapani Raiko,et al.  Semi-supervised Learning with Ladder Networks , 2015, NIPS.

[18]  Dariu Gavrila,et al.  A Probabilistic Framework for Joint Pedestrian Head and Body Orientation Estimation , 2015, IEEE Transactions on Intelligent Transportation Systems.

[19]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[20]  Fan Yang,et al.  Good Semi-supervised Learning That Requires a Bad GAN , 2017, NIPS.

[21]  Sebastian Scherer,et al.  Autonomous drone cinematographer: Using artistic principles to create smooth, safe, occlusion-free trajectories for aerial filming , 2018, ISER.

[22]  Hongbin Zha,et al.  Probabilistic Inference for Occluded and Multiview On-road Vehicle Detection , 2016, IEEE Transactions on Intelligent Transportation Systems.

[23]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Lin Wang,et al.  A novel heading estimation algorithm for pedestrian using a smartphone without attitude constraints , 2016, 2016 Fourth International Conference on Ubiquitous Positioning, Indoor Navigation and Location Based Services (UPINLBS).

[25]  Nitish Srivastava,et al.  Unsupervised Learning of Video Representations using LSTMs , 2015, ICML.

[26]  Sameer A. Nene,et al.  Columbia Object Image Library (COIL100) , 1996 .

[27]  Nir Ailon,et al.  Semi-supervised deep learning by metric embedding , 2016, ICLR.

[28]  Nitish Srivastava Unsupervised Learning of Visual Representations using Videos , 2015 .

[29]  Zhenyu Na,et al.  Heading estimation fusing inertial sensors and landmarks for indoor navigation using a smartphone in the pocket , 2017, EURASIP J. Wirel. Commun. Netw..

[30]  Shenghuo Zhu,et al.  Deep Learning of Invariant Features via Simulated Fixations in Video , 2012, NIPS.

[31]  Sambit Bakshi,et al.  Direction Estimation for Pedestrian Monitoring System in Smart Cities: An HMM Based Approach , 2016, IEEE Access.

[32]  Sebastian Nowozin,et al.  Deep Directional Statistics: Pose Estimation with Uncertainty Quantification , 2018, ECCV.

[33]  Sebastian Scherer,et al.  Integrating kinematics and environment context into deep inverse reinforcement learning for predicting off-road vehicle trajectories , 2018, CoRL.

[34]  Klaus C. J. Dietmayer,et al.  The Ko-PER intersection laserscanner and video dataset , 2014, 17th International IEEE Conference on Intelligent Transportation Systems (ITSC).

[35]  Markus Braun,et al.  Pose-RCNN: Joint object detection and pose estimation using 3D object proposals , 2016, 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC).

[36]  Mubarak Shah,et al.  UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.