Medical image classification is an important task in computer-aided diagnosis systems. Its performance is critically determined by the descriptiveness and discriminative power of features extracted from images. With rapid development of deep learning, deep convolutional neural networks (CNNs) have been widely used to learn the optimal high-level features from the raw pixels of images for a given classification task. However, due to the limited amount of labeled medical images with certain quality distortions, such techniques crucially suffer from the training difficulties, including overfitting, local optimums, and vanishing gradients. To solve these problems, in this article, we propose a two-stage selective ensemble of CNN branches via a novel training strategy called deep tree training (DTT). In our approach, DTT is adopted to jointly train a series of networks constructed from the hidden layers of CNN in a hierarchical manner, leading to the advantage that vanishing gradients can be mitigated by supplementing gradients for hidden layers of CNN, and intrinsically obtain the base classifiers on the middle-level features with minimum computation burden for an ensemble solution. Moreover, the CNN branches as base learners are combined into the optimal classifier via the proposed two-stage selective ensemble approach based on both accuracy and diversity criteria. Extensive experiments on CIFAR-10 benchmark and two specific medical image datasets illustrate that our approach achieves better performance in terms of accuracy, sensitivity, specificity, and F1 score measurement.