Instance segmentation based on 3D point cloud is a key step in scene understanding. It is widely used in indoor robot navigation, outdoor autonomous driving, and other fields. But research in this area is still in its infancy. Instance segmentation not only needs to predict the semantic label of each point but also the instance label of each point. Therefore, semantic segmentation can be considered the basis of instance segmentation to some extent. Based on this motivation, we designed a voxel-based branch based on convertible sparse convolution and residual optimization modules. We design a point-based branch so that the network can maintain high-resolution representation. Then the two branches are combined to optimize the semantic segmentation results. Breadth-first search (BFS) performs well in indoor point clouds and is simple to operate. Therefore, we use this clustering operation to group the points of the same instance to obtain the instance segmentation result. The proposed method was tested on the indoor dataset Scan-Net v2 and achieved relatively good instance segmentation precision.