CNNs-Based RGB-D Saliency Detection via Cross-View Transfer and Multiview Fusion.

Salient object detection from RGB-D images aims to utilize both the depth view and RGB view to automatically localize objects of human interest in the scene. Although a few earlier efforts have been devoted to the study of this paper in recent years, two major challenges still remain: 1) how to leverage the depth view effectively to model the depth-induced saliency and 2) how to implement an optimal combination of the RGB view and depth view, which can make full use of complementary information among them. To address these two challenges, this paper proposes a novel framework based on convolutional neural networks (CNNs), which transfers the structure of the RGB-based deep neural network to be applicable for depth view and fuses the deep representations of both views automatically to obtain the final saliency map. In the proposed framework, the first challenge is modeled as a cross-view transfer problem and addressed by using the task-relevant initialization and adding deep supervision in hidden layer. The second challenge is addressed by a multiview CNN fusion model through a combination layer connecting the representation layers of RGB view and depth view. Comprehensive experiments on four benchmark datasets demonstrate the significant and consistent improvements of the proposed approach over other state-of-the-art methods.