Image-based virtual try-on systems with the goal of transferring a target clothing item onto the corresponding region of a person have received great attention recently. However, it is still a challenge for the existing methods to generate photo-realistic try-on images while preserving non-target details(Fig. 1). To resolve this issue, we present a novel virtual try-on network, DP-VTON. First, a clothing warping module combines pixel transformation with feature transformation to transform the target clothing. Second, a semantic segmentation prediction module predicts a semantic segmentation map of the person wearing the target clothing. Third, an arm generation module generates arms of the reference image that will be changed after try-on. Finally, the warped clothing, semantic segmentation map, arms image and other non-target details (e.g. face, hair, bottom clothes) are fused together for try-on image synthesis. Extensive experiments demonstrate our system achieves the state-of-the-art virtual try-on performance both qualitatively and quantitatively.1