Pose-Guided Person Image Synthesis for Data Augmentation in Pedestrian Detection

In this paper, we present a data augmentation framework for pedestrian detection using a pose-guided person image synthesis model. The proposed framework can boost the performance of state-of-the-art pedestrian detectors by generating new and unseen pedestrian training samples with controllable appearances and poses. This is achieved by a new latent-consistent adversarial variational auto-encoder (LAVAE) model, leveraging the advantages of conditional variational auto-encoders and conditional generative adversarial networks to disengage and reconstruct person images conditioned on target poses. An additional latent regression path is introduced to preserve appearance information and to guarantee a spatial alignment during transfer. LAVAE goes beyond existing works in restoring structural information and perceptual details with limited annotations and can further benefit the pedestrian detection task in automated driving scenarios. Extensive pedestrian detection and person image synthesis experiments are performed on the EuroCity Person dataset. We show that data augmentation using LAVAE improves the accuracy of state-of-the-art pedestrian detectors significantly. Furthermore, a competitive performance can be observed when we compare LAVAE with other generative models for person image synthesis.