Efficient graphical-processor-unit parallelization algorithm for computing Eigen values

Abstract. Several leading-edge applications such as pathology detection, biometric identification, and face recognition are based mainly on blob and line detection. To address this problem, Eigen value computing has been commonly employed due to its accuracy and robustness. However, Eigen value computing requires a raised computational processing, intensive memory data access, and data overlapping, which involve higher execution times. To overcome these limitations, we propose in this paper a new parallel strategy to implement Eigen value computing using a graphics processing unit (GPU). Our contributions are (1) to optimize instruction scheduling to reduce the computation time, (2) to efficiently partition processing into blocks to increase the occupancy of streaming multiprocessors, (3) to provide efficient input data splitting on shared memory to benefit from its lower access time, and (4) to propose new data management of shared memory to avoid access memory conflict and reduce memory bank accesses. Experimental results show that our proposed GPU parallel strategy for Eigen value computing achieves speedups of 27 compared with a multithreaded implementation, of 16 compared with a predefined function in the OpenCV library, and of eight compared with a predefined function in the Cublas library, all of which are performed into a quad core multi-central-processing unit platform. Next, our parallel strategy is evaluated through an Eigen value-based method for retinal thick vessel segmentation, which is essential for detecting ocular pathologies. Eigen value computing is executed in 0.017 s when using Structured Analysis of the Retina database images. Accordingly, we achieved real-time thick retinal vessel segmentation with an average execution time of about 0.039 s.

[1]  Minyi Guo,et al.  Balancing Efficiency and Flexibility for DNN Acceleration via Temporal GPU-Systolic Array Integration , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[2]  Paweł Czarnul,et al.  Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs , 2019, The Journal of Supercomputing.

[3]  Olivier Giroux,et al.  Volta: Performance and Programmability , 2018, IEEE Micro.

[4]  Gwanggil Jeon,et al.  GPU-parallel interpolation using the edge-direction based normal vector method for terrain triangular mesh , 2016, Journal of Real-Time Image Processing.

[5]  Bhabatosh Chanda,et al.  A Simple and Fast Algorithm to Detect the Fovea Region in Fundus Retinal Image , 2011, 2011 Second International Conference on Emerging Applications of Information Technology.

[6]  Nasser Kehtarnavaz,et al.  A mobile computer aided system for optic nerve head detection , 2018, Comput. Methods Programs Biomed..

[7]  Mohamed Akil,et al.  Computational efficiency of optic disk detection on fundus image: a survey , 2018, Commercial + Scientific Sensing and Imaging.

[8]  Heidrun Wabnitz,et al.  M3BA: A Mobile, Modular, Multimodal Biosignal Acquisition Architecture for Miniaturized EEG-NIRS-Based Hybrid BCI and Monitoring , 2017, IEEE Transactions on Biomedical Engineering.

[9]  Davut Hanbay,et al.  Continuous rotation invariant features for gradient-based texture classification , 2015, Comput. Vis. Image Underst..

[10]  Marco Maggioni,et al.  Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.

[11]  Jinpeng Jiang,et al.  Acceleration for 2D time-domain elastic full waveform inversion using a single GPU card , 2018 .

[12]  Keshab K. Parhi,et al.  Blood Vessel Segmentation of Fundus Images by Major Vessel Extraction and Subimage Classification , 2015, IEEE Journal of Biomedical and Health Informatics.

[13]  Antonio Martinez-Sanchez,et al.  A differential structure approach to membrane segmentation in electron tomography. , 2011, Journal of structural biology.

[14]  Jinkai Cui,et al.  Retinal vessel segmentation in colour fundus images using Extreme Learning Machine , 2017, Comput. Medical Imaging Graph..

[15]  Billy Y. S. Yiu,et al.  A GPU-Parallelized Eigen-Based Clutter Filter Framework for Ultrasound Color Flow Imaging , 2017, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control.

[16]  Fuente-ArriagaJosé Abel de la,et al.  Application of vascular bundle displacement in the optic disc for glaucoma detection using fundus images , 2014 .

[17]  Changming Sun,et al.  Junction detection for linear structures based on Hessian, correlation and shape information , 2012, Pattern Recognit..

[18]  Gaurav Sharma,et al.  Accelerated parametric chamfer alignment using a parallel, pipelined GPU realization , 2017, Journal of Real-Time Image Processing.

[19]  Sidi Ahmed Mahmoudi,et al.  GPU-based segmentation of cervical vertebra in X-Ray images , 2010, 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS).

[20]  Marimuthu Muthuvel,et al.  Microcalcification cluster detection using multiscale products based Hessian matrix via the Tsallis thresholding scheme , 2017, Pattern Recognit. Lett..

[21]  Changming Sun,et al.  A new method for linear feature and junction enhancement in 2D images based on morphological operation, oriented anisotropic Gaussian function and Hessian information , 2014, Pattern Recognit..

[22]  Chokri Souani,et al.  Parallel implementation of Sobel filter using CUDA , 2017, 2017 International Conference on Control, Automation and Diagnosis (ICCAD).

[23]  Anil A. Bharath,et al.  Segmentation of blood vessels from red-free and fluorescein retinal images , 2007, Medical Image Anal..

[24]  Alejandro F. Frangi,et al.  Muliscale Vessel Enhancement Filtering , 1998, MICCAI.

[25]  Rosni Abdullah,et al.  Parallel Laplacian filter using CUDA on GP-GPU , 2014, Proceedings of the 6th International Conference on Information Technology and Multimedia.

[26]  Wei Wu,et al.  Multi-path convolutional neural network in fundus segmentation of blood vessels , 2020, Biocybernetics and Biomedical Engineering.

[27]  K. Preethi,et al.  Gaussian Filtering Implementation and Performance Analysis on GPU , 2018, 2018 International Conference on Inventive Research in Computing Applications (ICIRCA).

[28]  Robert Ritch,et al.  Retinal blood vessel positional shifts and glaucoma progression. , 2014, Ophthalmology.

[29]  Rengan Xu,et al.  Deep Learning at Scale on NVIDIA V100 Accelerators , 2018, 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).

[30]  B. M. ter Haar Romeny,et al.  Automated detection of cerebral microbleeds in patients with Traumatic Brain Injury , 2016, NeuroImage: Clinical.

[31]  Pheng-Ann Heng,et al.  Accelerating Active Shape Model using GPU for facial extraction in video , 2009, 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[32]  Todd Margolis,et al.  A Smartphone-Based Tool for Rapid, Portable, and Automated Wide-Field Retinal Imaging , 2018, bioRxiv.

[33]  Liehui Jiang,et al.  Contactless Fingerprint Image Enhancement Algorithm Based on Hessian Matrix and STFT , 2017, 2017 2nd International Conference on Multimedia and Image Processing (ICMIP).

[34]  Yi Zhou,et al.  Accelerating image convolution filtering algorithms on integrated CPU–GPU architectures , 2018, J. Electronic Imaging.

[35]  R. Cobbold,et al.  Single-ensemble-based eigen-processing methods for color flow imaging - Part I. The Hankel-SVD filter , 2008, IEEE Transactions on Ultrasonics, Ferroelectrics and Frequency Control.

[36]  Nadeem Salamat,et al.  Diabetic retinopathy techniques in retinal images: A review , 2019, Artif. Intell. Medicine.

[37]  Devrim Akgün,et al.  Accelerated method for the optimization of quadratic image filter , 2019, J. Electronic Imaging.

[38]  Kup-Sze Choi,et al.  Fast Gabor texture feature extraction with separable filters using GPU , 2013, Journal of Real-Time Image Processing.

[39]  Muhammed Fatih Talu,et al.  A novel active contour model for medical images via the Hessian matrix and eigenvalues , 2018, Comput. Math. Appl..

[40]  Yoshimitsu Kuroki,et al.  Fast implementation of Gaussian filter by parallel processing of binominal filter , 2016, 2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS).

[41]  Jeffrey S. Vetter,et al.  NVIDIA Tensor Core Programmability, Performance & Precision , 2018, 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).

[42]  Edgardo Manuel Felipe Riverón,et al.  Application of vascular bundle displacement in the optic disc for glaucoma detection using fundus images , 2014, Comput. Biol. Medicine.

[43]  Xinxin Mei,et al.  Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.

[44]  Mohamed Akil,et al.  GPU parallel implementation of the new hybrid binarization based on Kmeans method (HBK) , 2018, Journal of Real-Time Image Processing.

[45]  Ricardo J. Barrientos,et al.  GPU Tensor Cores for Fast Arithmetic Reductions , 2021, IEEE Transactions on Parallel and Distributed Systems.

[46]  Miguel Castelo-Branco,et al.  Optic Disc Localization in Retinal Images Based on Cumulative Sum Fields , 2016, IEEE Journal of Biomedical and Health Informatics.

[47]  Katherine W. Ferrara,et al.  A new high resolution color flow system using an eigendecomposition-based adaptive filter for clutter rejection , 2002 .

[48]  Malay Kishore Dutta,et al.  An adaptive threshold based image processing technique for improved glaucoma detection and classification , 2015, Comput. Methods Programs Biomed..

[49]  Haofu Liao High Performance Kernel Smoothing Library For Biomedical Imaging , 2015 .

[50]  Yongdong Zhang,et al.  GPU-based fast scale invariant interest point detector , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[51]  Xincheng Yao,et al.  WIDE-FIELD SMARTPHONE FUNDUS VIDEO CAMERA BASED ON MINIATURIZED INDIRECT OPHTHALMOSCOPY , 2017, Retina.

[52]  Hisham Dahshan,et al.  Enhancing the Actual Throughput of the AES Algorithm on the Pascal GPU Architecture , 2018, 2018 3rd International Conference on System Reliability and Safety (ICSRS).

[53]  Oliver Speck,et al.  Highest Resolution In Vivo Human Brain MRI Using Prospective Motion Correction , 2015, PloS one.

[54]  Thomas Bülow,et al.  A Radial Structure Tensor and Its Use for Shape-Encoding Medical Visualization of Tubular and Nodular Structures , 2013, IEEE Transactions on Visualization and Computer Graphics.