Compressing Local Descriptor Models for Mobile Applications

Feature-based image matching has been significantly improved through the use of deep learning and new large datasets. However, there has been little work addressing the computational cost, model size, and matching accuracy tradeoffs for the state of the art models. In this paper, we consider these practical aspects and improve the state-of-the-art HardNet model through the use of depthwise separable layers and an efficient tensor decomposition. We propose the Convolution-Depthwise-Pointwise (CDP) layer, which partitions the weights into a low and full rank decomposition to exploit the naturally emergent structure in the convolutional weights. We can achieve an 8× reduction in the number of parameters on the HardNet model, 13× reduction in the computational complexity, while sacrificing less than 1% on the overall accuracy across the HPatches benchmarks. To further demonstrate the generalisation of this approach, we apply it to other state-of-the-art descriptor models, where we are able to a significant performance improvement.