Unmanned aerial vehicles (UAVs) are emerging as a powerful tool for various industrial and smart city applications. UAVs coupled with various sensors can perform many cognitive tasks such as object detection, surveillance, traffic management, and urban planning. Deep learning has emerged as a popular technique to speed up the processing of high-dimensional data like images and videos, which has led to several applications in surveillance and autonomous driving. However, the area of aerial object detection has been understudied. This work proposes a deep learning approach for detection of objects in aerial scenes captured by UAVs. Our work first categorizes the current methods for aerial object detection using deep learning techniques and discusses how the task is different from general object detection scenarios. We delineate the specific challenges involved and experimentally demonstrate the key design decisions that significantly affect the accuracy and robustness of models. We further propose an optimized architecture that utilizes these optimal design choices along with the recent Res-NeSt backbone to achieve superior performance in aerial object detection. Lastly, we propose several research directions to inspire further advancement in aerial object detection.