Taxonomy of Vectorization Patterns of Programming for FIR Image Filters Using Kernel Subsampling and New One

Abstract: This study examines vectorized programming for finite impulse response image filtering. Finite impulse response image filtering occupies a fundamental place in image processing, and has several approximated acceleration algorithms. However, no sophisticated method of acceleration exists for parameter adaptive filters or any other complex filter. For this case, simple subsampling with code optimization is a unique solution. Under the current Moore’s law, increases in central processing unit frequency have stopped. Moreover, the usage of more and more transistors is becoming insuperably complex due to power and thermal constraints. Most central processing units have multi-core architectures, complicated cache memories, and short vector processing units. This change has complicated vectorized programming. Therefore, we first organize vectorization patterns of vectorized programming to highlight the computing performance of central processing units by revisiting the general finite impulse response filtering. Furthermore, we propose a new vectorization pattern of vectorized programming and term it as loop vectorization. Moreover, these vectorization patterns mesh well with the acceleration method of subsampling of kernels for general finite impulse response filters. Experimental results reveal that the vectorization patterns are appropriate for general finite impulse response filtering. A new vectorization pattern with kernel subsampling is found to be effective for various filters. These include Gaussian range filtering, bilateral filtering, adaptive Gaussian filtering, randomly-kernel-subsampled Gaussian range filtering, randomly-kernel-subsampled bilateral filtering, and randomly-kernel-subsampled adaptive Gaussian filtering.

[1]  Frédo Durand,et al.  A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach , 2006, ECCV.

[2]  Rafael C. González,et al.  Local Determination of a Moving Contrast Edge , 1985, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  David Gregg,et al.  Parallel Multi Channel convolution using General Matrix Multiplication , 2017, 2017 IEEE 28th International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[4]  Michael J. Flynn,et al.  Some Computer Organizations and Their Effectiveness , 1972, IEEE Transactions on Computers.

[5]  Hans Jürgen Mattausch,et al.  A Scalable Massively Parallel Processor for Real-Time Image Processing , 2011, IEEE J. Solid State Circuits.

[6]  Yamaguchi Yoshiki,et al.  How fast is an FPGA in image processing , 2008 .

[7]  Jason Lawrence,et al.  Image Perforation , 2016, ACM Trans. Graph..

[8]  Christopher J. Hughes,et al.  Single-Instruction Multiple-Data Execution , 2015, Single-Instruction Multiple-Data Execution.

[9]  G.E. Moore,et al.  Cramming More Components Onto Integrated Circuits , 1998, Proceedings of the IEEE.

[10]  Chau-Wen Tseng,et al.  Data transformations for eliminating conflict misses , 1998, PLDI.

[11]  Lucas J. van Vliet,et al.  Separable bilateral filtering for fast video preprocessing , 2005, 2005 IEEE International Conference on Multimedia and Expo.

[12]  Alexei A. Efros,et al.  Fast bilateral filtering for the display of high-dynamic-range images , 2002 .

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  S. Treitel,et al.  The Design of Multistage Separable Planar Filters , 1971 .

[15]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[16]  K. Jayachandra Reddy Acceleration of Shiftable O (1) Algorithm for Bilateral Filtering and Non-local means , 2014 .

[17]  Franz Franchetti,et al.  Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.

[18]  Kunal N. Chaudhury,et al.  Acceleration of the Shiftable $\mbi{O}{(1)}$ Algorithm for Bilateral Filtering and Nonlocal Means , 2012, IEEE Transactions on Image Processing.

[19]  Robert L. Cook,et al.  Stochastic sampling in computer graphics , 1988, TOGS.

[20]  Sei-ichiro Kamata,et al.  Fast bilateral filter for multichannel images via soft-assignment coding , 2016, 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA).

[21]  Virginie Grandgirard,et al.  Optimization of Fusion Kernels on Accelerators with Indirect or Strided Memory Access Patterns , 2017, IEEE Transactions on Parallel and Distributed Systems.

[22]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[23]  Patrice Y. Simard,et al.  High Performance Convolutional Neural Networks for Document Processing , 2006 .

[24]  G. Deng,et al.  An adaptive Gaussian filter for noise reduction and edge detection , 1993, 1993 IEEE Conference Record Nuclear Science Symposium and Medical Imaging Conference.

[25]  Sei-ichiro Kamata,et al.  Compressive Bilateral Filtering , 2015, IEEE Transactions on Image Processing.

[26]  Paolo Cignoni,et al.  A Low‐Memory, Straightforward and Fast Bilateral Filter Through Subsampling in Spatial Domain , 2012, Comput. Graph. Forum.

[27]  Tsutomu Maruyama,et al.  Performance comparison of FPGA, GPU and CPU in image processing , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[28]  R. Deriche Recursively implementating the Gaussian and its derivatives , 1993 .

[29]  Roberto Manduchi,et al.  Bilateral filtering for gray and color images , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[30]  Efraim Rotem,et al.  Power and thermal constraints of modern system-on-a-chip computer , 2013 .

[31]  Pascal Getreuer,et al.  A Survey of Gaussian Convolution Algorithms , 2013, Image Process. Line.

[32]  Yutaka Ishibashi,et al.  Switching dual kernels for separable edge-preserving filtering , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Sei-ichiro Kamata,et al.  Fast Gaussian filter with second-order shift property of DCT-5 , 2013, 2013 IEEE International Conference on Image Processing.

[34]  John Tran,et al.  cuDNN: Efficient Primitives for Deep Learning , 2014, ArXiv.

[35]  Wai-kuen Cham,et al.  Single image focus editing , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[36]  Lucas J. van Vliet,et al.  Recursive implementation of the Gaussian filter , 1995, Signal Process..

[37]  G. Moore Cramming more components onto integrated circuits, Reprinted from Electronics, volume 38, number 8, April 19, 1965, pp.114 ff. , 2006, IEEE Solid-State Circuits Newsletter.

[38]  Frédo Durand,et al.  Defocus Magnification , 2007, Comput. Graph. Forum.

[39]  Lucas J. van Vliet,et al.  Recursive Gaussian derivative filters , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[40]  Harald Niederreiter,et al.  Probability and computing: randomized algorithms and probabilistic analysis , 2006, Math. Comput..

[41]  Jiawen Chen,et al.  Real-time edge-aware image processing with the bilateral grid , 2007, SIGGRAPH 2007.

[42]  Rajeev Motwani,et al.  Randomized Algorithms , 1995, SIGA.

[43]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[44]  Sei-ichiro Kamata,et al.  [Paper] Efficient Constant-time Gaussian Filtering with Sliding DCT/DST-5 and Dual-domain Error Minimization , 2015 .

[45]  Michael Werman,et al.  Cosine integral images for fast spatial and range filtering , 2011, 2011 18th IEEE International Conference on Image Processing.

[46]  Franklin C. Crow,et al.  Summed-area tables for texture mapping , 1984, SIGGRAPH.

[47]  William M. Wells,et al.  Efficient Synthesis of Gaussian Filters by Cascaded Uniform Filters , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.