Compared with ordinary optical images, the situation of remote sensing images is much more complicated. The problems caused by the shooting angles over the Earth’s surface are: 1) some target categories with more complex shooting environments greatly increase the difficulty of detection and 2) the remote sensing images with large and small targets at the same time leading to large changes in the target scale are difficult to handle. In this letter, we designed a novel scenario context-aware-based bidirectional feature pyramid network (SCBi-FPN) to address the above problems. There are two key modules of the proposed network: the scene context-aware module uses pyramid pooling to aggregate contextual information of the different regions to obtain better global contextual information. The bidirectional feature pyramid network (Bi-FPN) module with squeeze and excitation (SE) blocks connects feature layers at different scales in a cross-scale manner and performs weighted feature map fusion before passing through the SE blocks to enable the network to obtain more accurate information. The experiments demonstrate that our designed network has good results compared with the state-of-the-art methods. In particular, we achieved mean average precision (mAP) of 92.92 on the publicly available NWPU VHR-10 dataset.