Automatic and accurate 3D cardiac image segmentation plays a crucial role in cardiac disease diagnosis and treatment. Even though CNN based techniques have achieved great success in medical image segmentation, the expensive annotation, large memory consumption, and insufficient generalization ability still pose challenges to their application in clinical practice, especially in the case of 3D segmentation from high-resolution and large-dimension volumetric imaging. In this paper, we propose a few-shot learning framework by combining ideas of semi-supervised learning and self-training for whole heart segmentation and achieve promising accuracy with a Dice score of 0.890 and a Hausdorff distance of 18.539 mm with only four labeled data for training. When more labeled data provided, the model can generalize better across institutions. The key to success lies in the selection and evolution of high-quality pseudo labels in cascaded learning. A shape-constrained network is built to assess the quality of pseudo labels, and the self-training stages with alternative global-local perspectives are employed to improve the pseudo labels. We evaluate our method on the CTA dataset of the MM-WHS 2017 Challenge and a larger multi-center dataset. In the experiments, our method outperforms the state-of-the-art methods significantly and has great generalization ability on the unseen data. We also demonstrate, by a study of two 4D (3D+T) CTA data, the potential of our method to be applied in clinical practice.