Catiloc: Camera Image Transformer for Indoor Localization

In this paper the problem of single image indoor camera localization has been addressed. This is a difficult task, since no GPS is available and the training data being gathered for the indoor positioning system could be subject to many modifications such as occlusion, variation of illumination, or repetitive textures and patterns during the test, and these effects can easily fool any positioning system. In this paper, following the idea of self attention and the transformer networks, we customized the feature extraction system and the output extraction block of a recently used transformer in the image recognition task, so that to achieve the camera 3D position and 4D quaternion information. Moreover, an engineering implementation trick was employed, and the results were evaluated on the 7scenes dataset, and compared to the other state-of-the-art methods. The output results show a consistent outperformance with rather a simpler, and faster configuration.