Image domain adaption of simulated data for human pose estimation

Leveraging the power of deep neural networks, single-person pose estimation has made substantial progress throughout the last years. More recently, multi-person pose estimation has also become of growing importance, mainly driven by the high demand for reliable video surveillance systems in public security. To keep up with these demands, certain efforts have been made to improve the performance of such systems, which is yet limited by the insufficient amount of available training data. This work addresses this lack of labeled data: by diminishing the often faced problem of domain shift between synthetic images from computer game graphics engines and real world data, annotated training data shall be provided at zero labeling-cost. To this end, generative adversarial networks are applied as domain adaption framework, adapting the data of a novel synthetic pose estimation dataset to several real world target domains. State-of-the-art domain adaption methods are extended to meet the important requirement of exact content preservation between synthetic and adapted images. Experiments, that are subsequently conducted, indicate the improved suitability of the adapted data as human pose estimators trained on this data outperform those which are trained on purely synthetic images.

[1]  Xiaogang Wang,et al.  Cross-scene crowd counting via deep convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Bernt Schiele,et al.  PoseTrack: A Benchmark for Human Pose Estimation and Tracking , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).