The wild face of large variations is hard to recognize in unconstrained scenarios. To tackle this issue, existing works synthesize and augment the variation-specific faces for recognition. However, directly feeding generated samples results in negative transfer, because the feature spaces are shifted compared with normal samples. Instead, we propose a transitive distillation network (TDNet) that introduces a transitive domain to transfer cross-variation representations, which alleviates the negative influence of synthesized data. Specifically, data of diverse variations are firstly synthesized. Then we construct distributions from different variations as teachers to distill student. The negative transfer is mitigated by adopting adaptor as a bridge to break large domain distance. To handle faces of different quality, we propose a novel strategy to define easy and hard samples, which are utilized to select specific transitive status. Meanwhile, bilateral classification with curriculum learning is proposed to improve confidence of synthesized data gradually, enhancing the robustness of representation learning. Experiments show that our method achieves superiority on unconstrained face benchmarks such as IJB-C and SCface, while maintaining competence on general test sets.