Precise citywide mobile traffic prediction is of great significance for intelligent network planning and proactive service provisioning. Current traffic prediction approaches mainly focus on training a well-performed model for the cities with a large amount of mobile traffic data. However, for the cities with scarce data, the prediction performance will be greatly limited. To tackle this problem, in this paper we propose a novel cross-city deep transfer learning framework named CCTP for citywide mobile traffic prediction in cities with data scarcity. Specifically, we first present a novel spatial-temporal learning model and pre-train the model by abundant data of a source city to obtain prior knowledge of mobile traffic dynamics. We then devise an efficient generative adversarial network (GAN) based cross-domain adapter for distribution alignment between target data and source data. To deal with data scarcity issue in some clusters of target city, we further design an inter-cluster transfer learning strategy for performance enhancement. Extensive experiments conducted on real-world mobile traffic datasets demonstrate that our proposed CCTP framework can achieve superior performance in citywide mobile traffic prediction with data scarcity.