Fusing depth, color, and skeleton data for enhanced real-time hand segmentation

As sensing technology has evolved, spatial user interfaces have become increasingly popular platforms for interacting with video games and virtual environments. In particular, recent advances in consumer-level motion tracking devices such as the Microsoft Kinect have sparked a dramatic increase in user interfaces controlled directly by the user's hands and body. However, existing skeleton tracking middleware created for these sensors, such as those developed by Microsoft and OpenNI, tend to focus on coarse full-body motions, and suffers from several well-documented limitations when attempting to track the positions of the user's hands and segment them from the background. In this paper, we present an approach for more robustly handling these failure cases by combining the original skeleton tracking positions with the color and depth information returned from the sensor.