Detecting visually salient scene areas and deriving their relative spatial relations from continuous street-view panoramas

ABSTRACT A salient scene is an area within an image that contains visual elements that stand out from surrounding areas. They are important for distinguishing landmarks in first-person-view (FPV) applications and determining spatial relations in images. The relative spatial relation between salient scenes acts as a visual guide that is easily accepted and understood by users in FPV applications. However, current digitally navigable maps and location-based services fall short of providing information on visual spatial relations for users. This shortcoming has a critical influence on the popularity and innovation of FPV applications. This paper addresses the issue by proposing a method for detecting visually salient scene areas (SSAs) and deriving their relative spatial relationships from continuous panoramas. This method includes three critical steps. First, an SSA detection approach is introduced by fusing region-based saliency derived from super-pixel segmentation and the frequency-tuned saliency model. The method focuses on a segmented landmark area in a panorama. Secondly, a street-view-oriented SSA generation method is introduced by matching and merging the visual SSAs from continuous panoramas. Thirdly, a continuous geotagged panorama-based referencing approach is introduced to derive the relative spatial relationships of SSAs from continuous panoramas. This information includes the relative azimuth, elevation angle, and the relative distance. Experiment results show that the error for the SSA relative azimuth angle is approximately ± 6° (with an average error of 2.67°), and the SSA relative elevation angle is approximately ± 4° (with an average error of 1.32°) when using Baidu street-view panoramas. These results demonstrate the feasibility of the proposed approach. The method proposed in this study can facilitate the development of FPV applications such as augmented reality (AR) and pedestrian navigation using proper spatial relation.

[1]  C. Koch,et al.  Computational modelling of visual attention , 2001, Nature Reviews Neuroscience.

[2]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[3]  Gang Wang,et al.  Progressive Attention Guided Recurrent Network for Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Luis Miguel Bergasa,et al.  Text Detection and Recognition on Traffic Panels From Street-Level Imagery Using Visual Appearance , 2014, IEEE Transactions on Intelligent Transportation Systems.

[5]  Jingnan Liu,et al.  Towards precise car navigation: Detection of relative vehicle position on highway for collision avoidance , 2010, 2010 Ubiquitous Positioning Indoor Navigation and Location Based Service.

[6]  Patricio A. Vela,et al.  Efficient Closed-Loop Detection and Pose Estimation for Vision-Only Relative Localization in Space with a Cooperative Target , 2014 .

[7]  Pietro Perona,et al.  Graph-Based Visual Saliency , 2006, NIPS.

[8]  Ron Kimmel,et al.  Efficient Dilation, Erosion, Opening, and Closing Algorithms , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jason L. Speyer,et al.  Sensor Fusion Applied to Autonomous Aerial Refueling , 2009 .

[10]  Yueting Zhuang,et al.  DeepSaliency: Multi-Task Deep Neural Network Model for Salient Object Detection , 2015, IEEE Transactions on Image Processing.

[11]  N. K. Philip,et al.  Relative position and attitude estimation and control schemes for the final phase of an autonomous docking mission of spacecraft , 2003 .

[12]  Dong Xu,et al.  Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey , 2018, IEEE Signal Processing Magazine.

[13]  Lei Chen,et al.  Space-based visible observation strategy for beyond-LEO objects based on an equatorial LEO satellite with multi-sensors , 2017 .

[14]  Kevin Murphy,et al.  Attention-Based Extraction of Structured Information from Street View Imagery , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[15]  N. Otsu A threshold selection method from gray level histograms , 1979 .

[16]  Huchuan Lu,et al.  Saliency Detection via Graph-Based Manifold Ranking , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Sang-Young Park,et al.  Improved GPS-based satellite relative navigation using femtosecond laser relative distance measurements , 2016 .

[18]  Xing Zhang,et al.  A cube-based saliency detection method using integrated visual and spatial features , 2016 .

[19]  Martin D. Levine,et al.  Visual Saliency Based on Scale-Space Analysis in the Frequency Domain , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Qiang Wu,et al.  Street view cross-sourced point cloud matching and registration , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[21]  Jean-Michel Morel,et al.  ASIFT: An Algorithm for Fully Affine Invariant Comparison , 2011, Image Process. Line.

[22]  Yang Songpu,et al.  Research on visual navigation technology of unmanned aerial vehicle landing , 2013, 2013 IEEE International Conference on Information and Automation (ICIA).

[23]  Shai Segal,et al.  Vision-based relative state estimation of non-cooperative spacecraft under modeling uncertainty , 2011, 2011 Aerospace Conference.

[24]  Eric J. Fielding,et al.  Measuring Azimuth Deformation With L-Band ALOS-2 ScanSAR Interferometry , 2017, IEEE Transactions on Geoscience and Remote Sensing.

[25]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Benjamin W Tatler,et al.  The central fixation bias in scene viewing: selecting an optimal viewing position independently of motor biases and image feature distributions. , 2007, Journal of vision.

[27]  Nuno Vasconcelos,et al.  Decision-Theoretic Saliency: Computational Principles, Biological Plausibility, and Implications for Neurophysiology and Psychophysics , 2009, Neural Computation.

[28]  V. C. Vinny,et al.  Identification of relative position of two objects in space using image processing , 2016, 2016 International Conference on Next Generation Intelligent Systems (ICNGIS).

[29]  Ian D. Reid,et al.  RefineNet: Multi-path Refinement Networks for High-Resolution Semantic Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Rong Li,et al.  Improved Cubemap model for 3D navigation in geo-virtual reality , 2015, Int. J. Digit. Earth.

[31]  Yaohong Qu,et al.  Cooperative localization based on the azimuth angles among multiple UAVs , 2013, 2013 International Conference on Unmanned Aircraft Systems (ICUAS).

[32]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Rolf Adams,et al.  Seeded Region Growing , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Yong Cheol Kim,et al.  Recognition of Urban Buildings by SIFT Matching with Google Street View Images , 2016, RACS.

[35]  Jyun-Min Dai,et al.  Road surface detection and recognition for route recommendation , 2017, 2017 IEEE Intelligent Vehicles Symposium (IV).

[36]  Luc Vincent,et al.  Taking Online Maps Down to Street Level , 2007, Computer.

[37]  Jun Rekimoto,et al.  JackIn: integrating first-person view with out-of-body vision generation for human-human augmentation , 2014, AH.

[38]  Ajay Mittal,et al.  A Survey on Various Edge Detector Techniques , 2012 .

[39]  Takashi Komuro,et al.  Space-sharing AR interaction on multiple mobile devices with a depth camera , 2016, 2016 IEEE Virtual Reality (VR).

[40]  Sabine Süsstrunk,et al.  Salient Region Detection and Segmentation , 2008, ICVS.

[41]  Andrew J. May,et al.  Pedestrian navigation aids: information requirements and design implications , 2003, Personal and Ubiquitous Computing.

[42]  Steve Ulrich,et al.  Vision-Based Relative Navigation and Control for Autonomous Spacecraft Inspection of an Unknown Object , 2013 .

[43]  Hong Xu,et al.  An Invisible Salient Landmark Approach to Locating Pedestrians for Predesigned Business Card Route of Pedestrian Navigation , 2018, Sensors.

[44]  Mubarak Shah,et al.  Visual attention detection in video sequences using spatiotemporal cues , 2006, MM '06.

[45]  Alan C. Bovik,et al.  Foveated Analysis and Selection of Visual Fixations in Natural Scenes , 2006, 2006 International Conference on Image Processing.

[46]  Ming Zhang,et al.  A neutrosophic approach to image segmentation based on watershed method , 2010, Signal Process..

[47]  K. M. Pooja,et al.  Image Segmentation: A Survey , 2016 .

[48]  Qingquan Li,et al.  An assessment method for landmark recognition time in real scenes , 2014 .

[49]  Ali Borji,et al.  Cross-View Image Synthesis Using Conditional GANs , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[50]  John L. Crassidis,et al.  Photometry and angles data for spacecraft relative navigation , 2013 .

[51]  Takeshi Sasaki,et al.  Eye gaze estimation based on ellipse fitting and three-dimensional model of eye for “Intelligent Poster” , 2014, 2014 IEEE/ASME International Conference on Advanced Intelligent Mechatronics.

[52]  Qinglin Wang,et al.  Vision Autonomous Relative Navigation Algorithm for Distributed Micro/Nano Satellite Earth Observation System Based on Motor Algebra , 2009, 2009 International Conference on Environmental Science and Information Application Technology.

[53]  Zhixiang Fang,et al.  Relative space-based GIS data model to analyze the group dynamics of moving objects , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[54]  Bisheng Yang,et al.  Computing multiple aggregation levels and contextual features for road facilities recognition using mobile laser scanning data , 2017 .

[55]  Mohsen H. Tehrani,et al.  Gyroscope offset estimation using panoramic vision-based attitude estimation and Extended Kalman Filter , 2012, CCCA12.

[56]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[57]  Song Chen,et al.  Crowd-sourced pictures geo-localization method based on street view images and 3D reconstruction , 2018, ISPRS Journal of Photogrammetry and Remote Sensing.