The integration of photogrammetry and deep learning methods can be powerful for Earth observation applications. Photogrammetry techniques allow the achievement of detailed geospatial products with em-level positional accuracy. Deep learning enables automatic image classification, segmentation, and object detection. For instance, when dealing with a large data set, photogrammetric processing steps, such as image orientation and dense point cloud generation, results in high computational costs. In contrast, deep learning methods are fast in the inference step. Here, we explore the complementarity of deep learning and photogrammetry, aiming to generate accurate and fast geospatial information. The main aim is to discuss the possibilities of using deep learning in the photogrammetric process. We conduct experiments to present the potential of the Mask R-CNN method trained on the COCO dataset to generate masks, essential to remove image observations from moving objects during the orientation (alignment) step.