Comparison of Deep-Learning-Based Segmentation Models: Using Top View Person Images

Image segmentation is considered as a key research topic in the area of computer vision. It is pivotal in a broad range of real-life applications. Recently, the emergence of deep learning drives significant advancement in image segmentation; the developed systems are now capable of recognizing, segmenting, and classifying objects of specific interest in images. Generally, most of these techniques primarily focused on the asymmetric field of view or frontal view objects. This work explores widely used deep learning-based models for person segmentation using top view data set. The first model employed in this work is Fully Convolutional Neural Network (FCN) with Resnet-101 architecture. The network consists of a set of max-pooling and convolution layers to identify pixel-wise class labels and prediction of the mask. The second model is based on FCN called U-Net with Encoder-Decoder architecture. The encoder is mainly comprised of a contracting path, also called an encoder, which captures the context in the image and symmetric expanding path called decoder to enable accurate location. The third model used for top view person segmentation is a DeepLabV3 model also with encoder-decoder architecture. The encoder consists of trained Convolutional Neural Network (CNN) to encode feature maps of the input image. The decoder is used for up-sampling and reconstruction of output using important information extracted by the encoder. All segmentation models are firstly tested using pre-trained models (trained on frontal view data set). To improve the performance, these models are further trained using person data set captured from a top view. The output of all models consists of a segmented person in the top view images. The experimental results reveal the effectiveness and performance of segmentation models by achieving $IoU$ of 83%, 84%, and 86% and $mIoU$ of 80% 82% and 84% for FCN, U-Net, and DeepLabv3 respectively. Furthermore, the discussion is provided for output results with possible future guidelines.

[1]  Antonio J. Plaza,et al.  Image Segmentation Using Deep Learning: A Survey , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jitendra Malik,et al.  Simultaneous Detection and Segmentation , 2014, ECCV.

[3]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Julien Mairal,et al.  BlitzNet: A Real-Time Deep Network for Scene Understanding , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  Z. Zivkovic Improved adaptive Gaussian mixture model for background subtraction , 2004, ICPR 2004.

[6]  Alex Noel Joseph Raj,et al.  Application of fractal theory and fuzzy enhancement in ultrasound image segmentation , 2018, Medical & Biological Engineering & Computing.

[7]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[8]  Arun Kumar Sangaiah,et al.  A Robust Features-Based Person Tracker for Overhead Views in Industrial Environment , 2018, IEEE Internet of Things Journal.

[9]  Yu Liu,et al.  A review of semantic segmentation using deep neural networks , 2017, International Journal of Multimedia Information Retrieval.

[10]  Luca Maria Gambardella,et al.  Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images , 2012, NIPS.

[11]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[12]  Imran Ahmed,et al.  Person detector for different overhead views using machine learning , 2019, Int. J. Mach. Learn. Cybern..

[13]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[14]  Jian Dong,et al.  Towards Unified Object Detection and Semantic Segmentation , 2014, ECCV.

[15]  Imran Ahmed,et al.  Person Detection from Overhead View: A Survey , 2019, International Journal of Advanced Computer Science and Applications.

[16]  Garrison W. Cottrell,et al.  Understanding Convolution for Semantic Segmentation , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  In-So Kweon,et al.  Learning a Deep Convolutional Network for Light-Field Image Super-Resolution , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[18]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[19]  Vladlen Koltun,et al.  Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials , 2011, NIPS.

[20]  Zhidong Deng,et al.  Recent progress in semantic image segmentation , 2018, Artificial Intelligence Review.

[21]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[22]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[24]  Theo Gevers,et al.  Improving HOG with Image Segmentation: Application to Human Detection , 2012, ACIVS.

[25]  Adam Van Etten,et al.  You Only Look Twice: Rapid Multi-Scale Object Detection In Satellite Imagery , 2018, ArXiv.

[26]  Imran Ahmed,et al.  Energy Efficient Camera Solution for Video Surveillance , 2019, International Journal of Advanced Computer Science and Applications.

[27]  Yanjiang Wang,et al.  An improved adaptive background modeling algorithm based on Gaussian Mixture Model , 2008, 2008 9th International Conference on Signal Processing.

[28]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[29]  Gwanggil Jeon,et al.  Efficient topview person detector using point based transformation and lookup table , 2019, Comput. Commun..

[30]  Yi Yang,et al.  Attention to Scale: Scale-Aware Semantic Image Segmentation , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Tania Stathaki,et al.  Faster R-CNN for Robust Pedestrian Detection Using Semantic Segmentation Network , 2018, Front. Neurorobot..

[32]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34]  Dongmei Chen,et al.  Segmentation for Object-Based Image Analysis (OBIA): A review of algorithms and challenges from remote sensing perspective , 2019, ISPRS Journal of Photogrammetry and Remote Sensing.

[35]  Misbah Ahmad,et al.  Comparison of Person Tracking Algorithms Using Overhead View Implemented in OpenCV , 2019, 2019 9th Annual Information Technology, Electromechanical Engineering and Microelectronics Conference (IEMECON).

[36]  Seunghoon Hong,et al.  Learning Deconvolution Network for Semantic Segmentation , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37]  Awais Adnan,et al.  Robust Background Subtraction Based Person’s Counting From Overhead View , 2018, 2018 9th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON).

[38]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[39]  L. Joshua Leon,et al.  Watershed-Based Segmentation and Region Merging , 2000, Comput. Vis. Image Underst..

[40]  Kenneth Y. Goldberg,et al.  Visual tracking of human visitors under variable-lighting conditions for a responsive audio art installation , 2012, 2012 American Control Conference (ACC).

[41]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  José García Rodríguez,et al.  A survey on deep learning techniques for image and video semantic segmentation , 2018, Appl. Soft Comput..

[43]  D. Divya,et al.  A Survey on Image Segmentation Techniques , 2019 .

[44]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Jian Sun,et al.  Instance-Aware Semantic Segmentation via Multi-task Network Cascades , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).