GPGPU based implementation of a high performing No Reference (NR) - IQA algorithm, BLIINDS-II

A relatively recent thrust in IQA research has focused on estimating the quality of a distorted image without access to the original (reference) image. Algorithms for this so-called noreference IQA (NR IQA) have made great strides over the last several years, with some NR algorithms rivaling full-reference algorithms in terms of prediction accuracy. However, there still remains a large gap in terms of runtime performance; NR algorithms remain significantly slower than FR algorithms, owing largely to their reliance on natural-scene statistics and other ensemble-based computations. To address this issue, this paper presents a GPGPU implementation, using NVidia’s CUDA platform, of the popular Blind Image Integrity Notator using DCT Statistics (BLIINDS-II) algorithm [8], a state of the art NR-IQA algorithm. We copied the image over to the GPU and performed the DCT and the statistical modeling using the GPU. These operations, for each 5x5 pixel window, are executed in parallel. We evaluated the implementation by using NVidia Visual Profiler, and we compared the implementation to a previously optimized CPU C++ implementation. By employing suitable optimizations on code, we were able to reduce the runtime for each 512x512 image from approximately 270 ms down to approximately 9 ms, which includes the time for all data transfers across PCIe bus. We discuss our unique implementation of BLIINDS-II designed specifically for use on the GPU, the insights gained from the runtime analyses, and how the GPGPU techniques developed here can be adapted for use in other NR IQA algorithms. Introduction Effective and efficient quality assessment of visual content finds application in a plenty of areas ranging from quality monitoring of video delivery systems, comparison of compression techniques to image reconstruction. Unfortunately, the benefits of recent advances in IQA and VQA have not carried over to real world systems owing largely to long execution time of these algorithms even for a single frame of image as has been pointed out in multiple publications [1][2][3][9] in the past. GPGPU based implementation for three different Full Reference IQA algorithms have been presented in [4], [5] and [6] with varying success. In time sensitive applications like quality of service monitoring in live broadcasting and video conferencing, a fast performing No Reference IQA is very essential. Addressing this strong need [7] for real time No Reference IQA, we apply GPGPU techniques to a high performing No Reference IQA algorithm, BLIINDS-II. The objective of our project is to utilize the data parallelism in BLIINDS-II NR-IQA by implementing it using a GPGPU. We aim to study the compute resources and the memory bandwidth needed along with latency issues following the data access pattern of the algorithm and propose suitable optimization techniques. Overview of BLIINDS-II algorithm BLIINDS-II is a Non Distortion Specific Natural Scene Statistics (NSS) based NR-IQA. NSS models are the statistical models that represent undistorted images of natural scenes. The algorithm seeks to predict the quality score of a distorted image by estimating the deviation of a distorted image from NSS models. The algorithm first learns how the NSS model parameters vary with varying levels various types of image distortion. Later this learning is applied for the prediction of quality scores using the features extracted from the distorted image. It has been demonstrated to correlate well with human subjective image quality score and compares very well with other high performing FR IQA algorithms in literature. Next we describe the overall framework of the BLIINDS-II algorithm as shown in Figure 1. First the 2-D DCT coefficients of the input image are computed. These DCT coefficients are computed for each 5x5 pixel block of the image, with an overlap of one pixel width between two successive 5x5 blocks. The second step of the BLIINDS-II pipeline builds a parametric model of the extracted local DCT coefficients. Four parameters are computed for each 5x5 DCT block by applying a univariate generalized Gaussian density model to the non-DC coefficients of each block. These four parameters are described further below. In the third step, a feature vector is populated from the DCT coefficient parameters obtained in the previous step. There are two features extracted for each parameter. The obtained parameter values across all the 5x5 DCT blocks are averaged over the top 10 percentiles and top 100 percentiles. These two averages are the two features for each parameter. At this point we have 8 features extracted at input image resolution (512x512). The features are extracted across three spatial scales, so the input image is down sampled two times and steps number one to three are repeated to obtain a features vector of length 24, 8 for each spatial scale. In the final step, a Bayesian inference approach is used to predict the image quality score from the extracted features. This involves computation of the posterior probability of each possible quality score given the extracted set of features using a multivariate generalized Gaussian density model trained on a subset of LIVE IQA image database. Figure 1. High-Level Overview of the BLIINDS-II Framework IS&T International Symposium on Electronic Imaging 2017 Image Quality and System Performance XIV 21 https://doi.org/10.2352/ISSN.2470-1173.2017.12.IQSP-220 © 2017, Society for Imaging Science and Technology