SpineOne: A One-Stage Detection Framework for Degenerative Discs and Vertebrae

Spinal degeneration plagues many elders, office workers, and even the younger generations. Effective pharmic or surgical interventions can help relieve degenerative spine conditions. However, the traditional diagnosis procedure is often too laborious. Clinical experts need to localize discs and vertebrae as a preliminary step of pathological diagnosis. Machine learning systems have been developed to aid this procedure generally following a two-stage methodology: first perform anatomical localization, then pathological classification. Towards more efficient and accurate diagnosis, we propose a one-stage detection framework termed SpineOne to simultaneously localize and classify degenerative discs and vertebrae from magnetic resonance imaging (MRI) slices. SpineOne is built upon the following three key techniques: 1) a new design of the keypoint heatmap to facilitate simultaneous keypoint localization and classification; 2) the use of attention modules to better differentiate the representations between discs and vertebrae; and 3) a novel gradient-guided objective association mechanism to associate multiple learning objectives at the later training stage. Empirical results on the Spinal Disease Intelligent Diagnosis Tianchi Competition (SDID-TC) dataset of 550 exams demonstrate that our approach surpasses existing methods by a large margin.

[1]  Ben Glocker,et al.  Automatic Localization and Identification of Vertebrae in Arbitrary Field-of-View CT Scans , 2012, MICCAI.

[2]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[4]  Ziyan Wu,et al.  End-to-End Learning of Keypoint Detector and Descriptor for Pose Invariant 3D Matching , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[5]  Lei Zhao,et al.  SpineParseNet: Spine Parsing for Volumetric MR Image by a Two-Stage Segmentation Framework With Semantic Image Representation , 2020, IEEE Transactions on Medical Imaging.

[6]  Jonathan Tompson,et al.  Towards Accurate Multi-person Pose Estimation in the Wild , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Stefano Pedemonte,et al.  DeepSPINE: Automated Lumbar Vertebral Segmentation, Disc-level Designation, and Spinal Stenosis Grading Using Deep Learning , 2018, ArXiv.

[8]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Qi Tian,et al.  CenterNet: Keypoint Triplets for Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Daguang Xu,et al.  Automatic Vertebra Labeling in Large-Scale 3D CT using Deep Image-to-Image Network with Message Passing and Sparsity Regularization , 2017, IPMI.

[11]  Xian-Sheng Hua,et al.  Alpha-IoU: A Family of Power Intersection over Union Losses for Bounding Box Regression , 2021, ArXiv.

[12]  George Papandreou,et al.  DeeperLab: Single-Shot Image Parser , 2019, ArXiv.

[13]  Varun Ramakrishna,et al.  Convolutional Pose Machines , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Dong Liu,et al.  Deep High-Resolution Representation Learning for Human Pose Estimation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jiebo Luo,et al.  Joint Vertebrae Identification and Localization in Spinal CT Images by Combining Short- and Long-Range Contextual Information , 2018, IEEE Transactions on Medical Imaging.

[17]  Zhiao Huang,et al.  Associative Embedding: End-to-End Learning for Joint Detection and Grouping , 2016, NIPS.

[18]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Jonathan Tompson,et al.  PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model , 2018, ECCV.

[21]  Jun Zhao,et al.  Vertebrae Identification and Localization Utilizing Fully Convolutional Networks and a Hidden Markov Model , 2020, IEEE Transactions on Medical Imaging.

[22]  Andrew Zisserman,et al.  SpineNet: Automatically Pinpointing Classification Evidence in Spinal MRIs , 2016, MICCAI.

[23]  Andrew J. Davison,et al.  End-To-End Multi-Task Learning With Attention , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Xiaogang Wang,et al.  Multi-context Attention for Human Pose Estimation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Xingyi Zhou,et al.  Bottom-Up Object Detection by Grouping Extreme and Center Points , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Shuo Li,et al.  Automatic vertebrae recognition from arbitrary spine MRI images by a category-Consistent self-calibration detection framework , 2020, Medical Image Anal..

[28]  Hong-Yuan Mark Liao,et al.  YOLOv4: Optimal Speed and Accuracy of Object Detection , 2020, ArXiv.

[29]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[30]  Chengwen Chu,et al.  Localization and Segmentation of 3D Intervertebral Discs in MR Images by Data Driven Estimation , 2015, IEEE Transactions on Medical Imaging.

[31]  Yong Jae Lee,et al.  Interspecies Knowledge Transfer for Facial Keypoint Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Roman Jakubicek,et al.  Deep convolutional neural network‐based segmentation and classification of difficult to define metastatic spinal lesions in 3D CT data , 2018, Medical Image Anal..

[33]  Xiaogang Wang,et al.  Learning Feature Pyramids for Human Pose Estimation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[34]  H. Labelle,et al.  Spine Segmentation in Medical Images Using Manifold Embeddings and Higher-Order MRFs , 2013, IEEE Transactions on Medical Imaging.

[35]  Jeffrey H. Siewerdsen,et al.  Automatic vertebrae localization in spine CT: a deep-learning approach for image guidance and surgical data science , 2019, Medical Imaging.

[36]  Gang Yu,et al.  Cascaded Pyramid Network for Multi-person Pose Estimation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Han Zhang,et al.  Self-Attention Generative Adversarial Networks , 2018, ICML.

[38]  Jonathan Tompson,et al.  Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation , 2014, NIPS.

[39]  Xingyi Zhou,et al.  Objects as Points , 2019, ArXiv.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Bostjan Likar,et al.  A Framework for Automated Spine and Vertebrae Interpolation-Based Detection and Model-Based Segmentation , 2015, IEEE Transactions on Medical Imaging.

[42]  Hao Chen,et al.  Automatic Localization and Identification of Vertebrae in Spine CT via a Joint Learning Model with Deep Neural Networks , 2015, MICCAI.

[43]  Kotagiri Ramamohanarao,et al.  Learning Non-Unique Segmentation with Reward-Penalty Dice Loss , 2020, 2020 International Joint Conference on Neural Networks (IJCNN).

[44]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[46]  Christian Szegedy,et al.  DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Fei Wang,et al.  CentripetalNet: Pursuing High-Quality Keypoint Pairs for Object Detection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[48]  Jan S. Kirschke,et al.  Attention-Driven Deep Learning for Pathological Spine Segmentation , 2017, MSKI@MICCAI.

[49]  Cewu Lu,et al.  RMPE: Regional Multi-person Pose Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[50]  Yaser Sheikh,et al.  Single-Network Whole-Body Pose Estimation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[51]  Shuo Li,et al.  Spine‐GAN: Semantic segmentation of multiple spinal structures , 2018, Medical Image Anal..

[52]  Shuo Li,et al.  Sequential conditional reinforcement learning for simultaneous vertebral body detection and segmentation with modeling the spine anatomy , 2020, Medical Image Anal..

[53]  Yaser Sheikh,et al.  OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  Hao Chen,et al.  Evaluation and comparison of 3D intervertebral disc localization and segmentation methods for 3D T2 MR data: A grand challenge , 2017, Medical Image Anal..

[55]  Yaser Sheikh,et al.  Hand Keypoint Detection in Single Images Using Multiview Bootstrapping , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[57]  Jia Deng,et al.  Stacked Hourglass Networks for Human Pose Estimation , 2016, ECCV.