Evaluation of Multi-Stream Fusion for Multi-View Image Set Comparison

We consider the problem of image set comparison, i.e., to determine whether two image sets show the same unique object (approximately) from the same viewpoints. Our proposition is to solve it by a multi-stream fusion of several image recognition paths. Immediate applications of this method can be found in fraud detection, deduplication procedure, or visual searching. The contribution of this paper is a novel distance measure for similarity of image sets and the experimental evaluation of several streams for the considered problem of same-car image set recognition. To determine a similarity score of image sets (this score expresses the certainty level that both sets represent the same object visible from the same set of views), we adapted a measure commonly applied in blind signal separation (BSS) evaluation. This measure is independent of the number of images in a set and the order of views in it. Separate streams for object classification (where a class represents either a car type or a car model-and-view) and object-to-object similarity evaluation (based on object features obtained alternatively by the convolutional neural network (CNN) or image keypoint descriptors) were designed. A late fusion by a fully-connected neural network (NN) completes the solution. The implementation is of modular structure—for semantic segmentation we use a Mask-RCNN (Mask regions with CNN features) with ResNet 101 as a backbone network; image feature extraction is either based on the DeepRanking neural network or classic keypoint descriptors (e.g., scale-invariant feature transform (SIFT)) and object classification is performed by two Inception V3 deep networks trained for car type-and-view and car model-and-view classification (4 views, 9 car types, and 197 car models are considered). Experiments conducted on the Stanford Cars dataset led to selection of the best system configuration that overperforms a base approach, allowing for a 67.7% GAR (genuine acceptance rate) at 3% FAR (false acceptance rate).

[1]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Matloob Khushi,et al.  An Investigation of Credit Card Default Prediction in the Imbalanced Datasets , 2020, IEEE Access.

[4]  Ethan Rublee,et al.  ORB: An efficient alternative to SIFT or SURF , 2011, 2011 International Conference on Computer Vision.

[5]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[6]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[7]  Jan Huisken,et al.  Multi-view image fusion improves resolution in three-dimensional microscopy. , 2007, Optics express.

[8]  Jonathan Krause,et al.  3D Object Representations for Fine-Grained Categorization , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[9]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Andrzej Cichocki,et al.  Blind source separation with convolutive noise cancellation , 1997, Neural Computing & Applications.

[11]  K. Kavitha,et al.  Evaluation of Distance Measures for Feature based Image Registration using AlexNet , 2019, ArXiv.

[12]  Dongxi Liu,et al.  Performance Comparison and Current Challenges of Using Machine Learning Techniques in Cybersecurity , 2020, Energies.

[13]  Kai Liu,et al.  Multi-Stream Convolutional Neural Network for SAR Automatic Target Recognition , 2018, Remote. Sens..

[14]  Nemanja Djuric,et al.  Multi-View Fusion of Sensor Data for Improved Perception and Prediction in Autonomous Driving , 2020, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

[15]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Wlodzimierz Kasprzak,et al.  Multi-stream Fusion in Image Sets Comparison , 2021, AUTOMATION.

[17]  Vito Di Gesù,et al.  Distance-based functions for image comparison , 1999, Pattern Recognit. Lett..

[18]  Vijay Varadharajan,et al.  A Survey on Machine Learning Techniques for Cyber Security in the Last Decade , 2020, IEEE Access.

[19]  Rebecca P. Ang,et al.  An introduction to association rule mining: An application in counseling and help-seeking behavior of adolescents , 2007, Behavior research methods.

[20]  Yongkang Wong,et al.  Surface-Electromyography-Based Gesture Recognition by Multi-View Deep Learning , 2019, IEEE Transactions on Biomedical Engineering.

[21]  Mohamed S. Shehata,et al.  Image Matching Using SIFT, SURF, BRIEF and ORB: Performance Comparison for Distorted Images , 2017, ArXiv.

[22]  Tomasz Kornuta,et al.  Performance Evaluation of Binary Descriptors of Local Features , 2014, ICCVG.