Transfer Learning and Hyperparameter Optimization for Instance Segmentation with RGB-D Images in Reflective Elevator Environments

Elevators, a vital means for urban transportation, are generally lacking proper emergency call systems besidesan emergency button. In the case of unconscious or otherwise incapacitated passengers this can lead to lethalsituations. A camera-based surveillance system with AI-based alerts utilizing an elevator state machine can helppassengers unable to initiate an emergency call. In this research work, the applicability of RGB-D images asinput for instance segmentation in the highly reflective environment of an elevator cabin is evaluated. For objectsegmentation, a Region-based Convolution Neural Network (R-CNN) deep learning model is adapted to use depthinput data besides RGB by applying transfer learning, hyperparameter optimization and re-training on a newlyprepared elevator image dataset. Evaluations prove that with the chosen strategy, the accuracy of R-CNN instancesegmentation is applicable on RGB-D data, thereby resolving lack of image quality in the noise affected andreflective elevator cabins. The mean average precision (mAP) of 0.753 is increased to 0.768 after the incorporationof additional depth data and with additional FuseNet-FPN backbone on RGB-D the mAP is further increased to0.794. With the proposed instance segmentation model, reliable elevator surveillance becomes feasible as firstprototypes and on-road tests proof.

[1]  Yi Li,et al.  R-FCN: Object Detection via Region-based Fully Convolutional Networks , 2016, NIPS.

[2]  Marcus A. Magnor,et al.  Video Based Reconstruction of 3D People Models , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[3]  S. Beucher Use of watersheds in contour detection , 1979 .

[4]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[5]  Anil C. Kokaram,et al.  Reflection detection in image sequences , 2011, CVPR 2011.

[6]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[7]  Leonidas J. Guibas,et al.  ObjectNet3D: A Large Scale Database for 3D Object Recognition , 2016, ECCV.

[8]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[9]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[10]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[11]  Temple,et al.  PP , 2018, Catalysis from A to Z.

[12]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[13]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[14]  Dieter Fox,et al.  Unsupervised Feature Learning for RGB-D Based Object Recognition , 2012, ISER.

[15]  George Loizou,et al.  Computer vision and pattern recognition , 2007, Int. J. Comput. Math..

[16]  Frank Hutter,et al.  CMA-ES for Hyperparameter Optimization of Deep Neural Networks , 2016, ArXiv.

[17]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[18]  David D. Cox,et al.  Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures , 2013, ICML.

[19]  Jianxiong Xiao,et al.  SUN RGB-D: A RGB-D scene understanding benchmark suite , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  O. P. Kurganova,et al.  Dr. , 2019, D37. TOPICS IN GLOBAL HEALTH SERVICES RESEARCH.

[22]  M. Kilger,et al.  A shadow handler in a video-based real-time traffic monitoring system , 1992, [1992] Proceedings IEEE Workshop on Applications of Computer Vision.

[23]  한성민,et al.  WDR5 promotes the tumorigenesis of oral squamous cell carcinoma via CARM1/β-catenin axis , 2021, Odontology.

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Mingjun Wu,et al.  Spatio-temporal context for codebook-based dynamic background subtraction , 2010 .

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Xiaolin Wu,et al.  Single Image Reflection Removal Using Deep Encoder-Decoder Network , 2018, ArXiv.

[29]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[30]  Daniel Cremers,et al.  FuseNet: Incorporating Depth into Semantic Segmentation via Fusion-Based CNN Architecture , 2016, ACCV.

[31]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[32]  Ismail Ben Ayed,et al.  Secrets of GrabCut and Kernel K-Means , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[33]  H. Robbins A Stochastic Approximation Method , 1951 .

[34]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xinlei Chen,et al.  TensorMask: A Foundation for Dense Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[36]  Anders Grunnet-Jepsen,et al.  Intel(R) RealSense(TM) Stereoscopic Depth Cameras , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[37]  Alexei A. Efros,et al.  Unbiased look at dataset bias , 2011, CVPR 2011.

[38]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[39]  David R. Holmes,et al.  Strategies for Training Deep Learning Models in Medical Domains with Small Reference Datasets , 2020, J. WSCG.

[40]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[41]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[42]  James S. Duncan,et al.  Medical Image Analysis , 1999, IEEE Pulse.

[43]  IEEE Workshop on Applications of Computer Vision (WACV 2011), 5-7 January 2011, Kona, HI, USA , 2011, WACV.

[44]  Eduardo F. Morales,et al.  Image Segmentation Using Automatic Seeded Region Growing and Instance-Based Learning , 2007, CIARP.

[45]  Sergio L. Netto,et al.  A Survey on Performance Metrics for Object-Detection Algorithms , 2020, 2020 International Conference on Systems, Signals and Image Processing (IWSSIP).

[46]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[47]  B. Thesis,et al.  BACHELORS THESIS , 2004 .

[48]  Zhi Liu,et al.  Depth-aware object instance segmentation , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[49]  William A. Barrett,et al.  Interactive livewire boundary extraction , 2003 .

[50]  Alejandro F. Frangi,et al.  Active shape model segmentation with optimal features , 2002, IEEE Transactions on Medical Imaging.

[51]  Yoshua Bengio,et al.  Algorithms for Hyper-Parameter Optimization , 2011, NIPS.

[52]  Premkumar Natarajan,et al.  Implicit Language Model in LSTM for OCR , 2017, 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR).

[53]  Takuya Akiba,et al.  Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes , 2017, ArXiv.

[54]  Chen Chen,et al.  An Efficient 3D CNN for Action/Object Segmentation in Video , 2019, BMVC.

[55]  Yann LeCun,et al.  Indoor Semantic Segmentation using depth information , 2013, ICLR.

[56]  Ina Ruck,et al.  USA , 1969, The Lancet.

[57]  Xu Sun,et al.  Adaptive Gradient Methods with Dynamic Bound of Learning Rate , 2019, ICLR.

[58]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.

[59]  Barbara Caputo,et al.  A Deeper Look at Dataset Bias , 2015, Domain Adaptation in Computer Vision Applications.

[60]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Karl Pearson,et al.  Mathematical Contributions to the Theory of Evolution. IX. On the Principle of Homotyposis and Its Relation to Heredity, to the Variability of the Individual, and to that of the Race. Part I. Homotypos in the Vegetable Kingdom , 1901 .

[62]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  K. K. Sahu,et al.  Normalization: A Preprocessing Stage , 2015, ArXiv.

[64]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[65]  Richard Socher,et al.  Improving Generalization Performance by Switching from Adam to SGD , 2017, ArXiv.