In this article, we propose the suppositionalmesh-based unsupervised multiview stereo network (SMU-MVSNet), a conceptually innovative method that overcomes the challenges when encountering object occlusions and textureless areas by exploiting the intrinsic structure information of the target scene. Particularly, we propose the suppositional mesh (SM) to approximate the scene surface. Based on SM, on the one hand, we propose the single-view occlusion reasoning with which the occlusion masks can be efficiently generated to handle the scene occlusions. On the other hand, we design vertex-face normal consensus loss with the goal of regularizing the geometric constraint for intractable scene areas. Moreover, we propose to conduct all our contributions on an upsampled high-resolution depth maps, which further tap the potentials of SMU-MVSNet. We demonstrate the effectiveness of components in isolation and in combination on the DTU dataset and show the excellent generalization ability on the Tanks and Temples benchmark without any fine-tuning.