Vehicle detection in images from unmanned aerial vehicles (UAVs) plays an important role in traffic surveillance and urban planning due to the popularity of UAVs. However, the class imbalance problem is an important factor that restricts the performance of vehicle detectors. There are two types of class imbalance in UAV images, i.e., foreground-background imbalance and foreground–foreground imbalance. For anchor-based single stage detector, as many ground truths cannot be assigned to corresponding anchors because of low intersection over union, it makes the foreground-background imbalance problem more severe. Therefore, we propose a novel bag-based single-stage detector, which treats each position on the feature map as a bag. A simple and adaptive definition of bags is proposed along with the positive sample definition method, which is utilized to ensure more ground truths can be assigned to proper bags. In addition, we utilize online hard example mining method to control the proportion of positive and negative samples during the training process. To address the foreground–foreground imbalance, we propose a novel data augmentation algorithm, which allows us to create appropriate visual context for under-represented class. Extensive experiments demonstrate the superiority of the proposed algorithm, compared with other state-of-the-art solutions. Impact Statement—Recently, unmanned aerial vehicles (UAVs) are widely used in intelligent transportation due to their low price and high flexibility, which makes vehicle detection in UAV images important for automatically gathering of traffic information. However, the class imbalance problem, which is common in object detection where some classes have far fewer frequencies in the dataset, has an adverse effect on the performance of vehicle detectors. The data augmentation method and deep learning based vehicle detector proposed in this article are able to reduce the negative impact and improve detection performance by at least 1.27% in mean average precision index. In addition, compared with algorithms with similar detection performance, our method is at least 15 ms faster. The proposed method can benefit users in a wide variety of applications including UAV transportation, traffic surveillance, and urban planning.