Subsequent Keyframe Generation for Visual Servoing

In this paper, we study the problem of autonomous and reliable positioning of a camera w.r.t. an object when only this latter is known but not the rest of the scene. We propose to combine the advantages and efficiency of a visual servoing scheme and the generalization ability of a generative adversarial network. The paper describes how to efficiently create a synthetic dataset in order to train a network that predicts an intermediate visual keyframe between two images. Subsequent predictions are used as visual features to autonomously converge towards the desired pose even for large displacements. We show that the proposed method can be used without any prior knowledge on the scene appearance except for the object itself, while being robust to various lighting conditions and specular surfaces. We provide experimental results, both in simulation and using a real service robot platform to validate and evaluate the effectiveness, robustness, and accuracy of our approach.