DCT-Based Local Descriptor for Robust Matching and Feature Tracking in Wide Area Motion Imagery

We introduce a novel discrete cosine transform-based feature (DCTF) descriptor designed for both robustly matching features in aerial video and tracking features across wide-baseline oblique views in aerial wide area motion imagery (WAMI). Our DCTF descriptor preserves local structure more compactly in the frequency domain by utilizing the mathematical properties of the discrete cosine transform (DCT) and outperforms widely used the spatial-domain feature extraction methods, such as speeded up robust features (SURF) and scale-invariant feature transform (SIFT). The DCTF descriptor can be used in combination with other feature detectors, such as SURF and features from accelerated segment test (FAST), for which we provide experimental results. The performance of DCTF for image matching and feature tracking is evaluated on two city-scale aerial WAMI data sets (ABQ-215 and LA-351) and a synthetic aerial drone video data set digital imaging and remote sensing image generation (Rochester Institute of Technology (RIT)-DIRSIG). DCTF is a compact 120-D descriptor that is less than half the dimensionality of state-of-the-art deep learning-based approaches, such as SuperPoint, LF-Net, and DeepCompare, which requires no learning and is domain-independent. Despite its small size, the DCTF descriptor surprisingly produces the highest image matching accuracies (F₁ = 0.76 and ABQ-215), the longest maximum and average feature track lengths, and the lowest tracking error (0.3 pixel, LA-351) compared with both handcrafted and deep learning features.

[1]  Jan-Michael Frahm,et al.  Structure-from-Motion Revisited , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[3]  Tomasz Malisiewicz,et al.  SuperPoint: Self-Supervised Interest Point Detection and Description , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Guna Seetharaman,et al.  Flux Tensor Constrained Geodesic Active Contours with Sensor Fusion for Persistent Object Tracking , 2007, J. Multim..

[5]  Don R. Hush,et al.  Wide-Area Motion Imagery , 2010, IEEE Signal Processing Magazine.

[6]  Nikos Komodakis,et al.  Learning to compare image patches via convolutional neural networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[8]  Pascal Fua,et al.  LF-Net: Learning Local Features from Images , 2018, NeurIPS.

[9]  Guna Seetharaman,et al.  Robust Camera Pose Refinement and Rapid SfM for Multiview Aerial Imagery—Without RANSAC , 2015, IEEE Geoscience and Remote Sensing Letters.

[10]  Guna Seetharaman,et al.  Stabilization of Airborne Video Using Sensor Exterior Orientation with Analytical Homography Modeling , 2019, Machine Vision and Navigation.

[11]  Bo Li,et al.  Local Feature Descriptor for Image Matching: A Survey , 2019, IEEE Access.

[12]  Adrien Bartoli,et al.  Fast Explicit Diffusion for Accelerated Features in Nonlinear Scale Spaces , 2013, BMVC.

[13]  Guna Seetharaman,et al.  Parallax-Tolerant Aerial Image Georegistration and Efficient Camera Pose Refinement—Without Piecewise Homographies , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[14]  Tom Drummond,et al.  Machine Learning for High-Speed Corner Detection , 2006, ECCV.

[15]  Gary R. Bradski,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[16]  Derek J. Walvoord,et al.  Assessing geoaccuracy of structure from motion point clouds from long-range image collections , 2014 .

[17]  G. Seetharaman,et al.  Wide-Area Persistent Airborne Video: Architecture and Challenges , 2011 .