The paper “Convolutional Networks for semantic Heads Segmentation using Top-View Depth Data in Crowded Environment” [1] introduces an approach to track and detect people in cases of heavy occlusions based on CNNs for semantic segmentation using top-view RGB-D visual data. The purpose is the design of a novel U-Net architecture, U-Net 3, that has been modified compared to the previous ones at the end of each layer. In order to evaluate this new architecture a comparison has been made with other networks in the literature used for semantic segmentation. The implementation is in Python code using Keras API with Tensorflow library. The input data consist of depth frames, from Asus Xtion Pro Live OpenNI recordings (.oni). The dataset used for training and testing of the networks has been manually labeled and it is freely available as well as the source code. The aforementioned networks have their stand-alone Python script implementation for training and testing. A Python script for the on-line prediction in OpenNI recordings (.oni) is also provided. Evaluation of the networks has been made with different metrics implementations (precision, recall, F1 Score, Sorensen-Dice coefficient), included in the networks scripts.
[1]
Thomas Brox,et al.
U-Net: Convolutional Networks for Biomedical Image Segmentation
,
2015,
MICCAI.
[2]
Emanuele Frontoni,et al.
Convolutional Networks for Semantic Heads Segmentation using Top-View Depth Data in Crowded Environment
,
2018,
2018 24th International Conference on Pattern Recognition (ICPR).
[3]
Hariharan Ravishankar,et al.
Learning and Incorporating Shape Models for Semantic Segmentation
,
2017,
MICCAI.
[4]
Jian Sun,et al.
Deep Residual Learning for Image Recognition
,
2015,
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[5]
Roberto Cipolla,et al.
SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation
,
2015,
IEEE Transactions on Pattern Analysis and Machine Intelligence.