Acceleration of arithmetic processing with CAM-based massive-parallel SIMD matrix core

Recently, several multimedia applications, such as digital image compression, digital video compression, and digital audio processing, are executed on the mobile devices. The processing core in the mobile device requires high performance and programmability. Generally, multimedia applications consist of repeated arithmetic operation and table-lookup coding operation. To improve the processing speed of the both operations on a processing core, Content Addressable Memory-based massive-parallel SIMD matrix core (CAMX) is proposed. The role of CAMX is an accelerator for mobile CPU core. CAMX has highly parallel processing capability and is configured by two CAM modules which are used in fast table-lookup operation. This paper shows that CAMX can process parallel repeated arithmetic operations and table-lookup coding operations assuming a 1.4 GHz operating frequency; AND, OR, XOR and ADD instructions can calculate 1,024 entries as 128-bit data in 0.34 GOPS (Giga Operations per Second) in parallel; search operation can search 1,024 entries as 128-bit data in 0.35 GOPS in parallel; multiplication can calculate 1,024 entries as 4-bit data in 0.34 GOPS in parallel. About multiplication processing, total clock cycle of table-lookup used processing is realized about 15 % lower than total clock cycle of bit-serial processing.

[1]  P.V. Sridevi,et al.  Performance Evaluation of Content Addressable Memories , 2018, 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO).

[2]  Hans Jürgen Mattausch,et al.  Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor , 2008, IEICE Trans. Electron..

[3]  Teuvo Kohonen,et al.  Content-addressable memories , 1980 .

[4]  Kenji Kanai,et al.  Overview of Multimedia Mobile Edge Computing , 2022 .

[5]  Irena Orović,et al.  Multimedia Signals and Systems , 2015, Springer International Publishing.

[6]  Mamoru Nakanishi,et al.  Morphological pattern spectrum and block cipher processing based image-manipulation detection , 2013 .

[7]  Saleh Khalaj Monfared,et al.  Fast AES Implementation: A High-Throughput Bitsliced Approach , 2019, IEEE Transactions on Parallel and Distributed Systems.

[8]  Kenji Kanai,et al.  [Invited Paper] Overview of Multimedia Mobile Edge Computing , 2018 .

[9]  Gulistan Raja,et al.  FPGA implementation of chaotic based AES image encryption algorithm , 2015, 2015 IEEE International Conference on Signal and Image Processing Applications (ICSIPA).

[10]  K. Dosaka,et al.  A 40GOPS 250mW massively parallel processor based on matrix architecture , 2006, 2006 IEEE International Solid State Circuits Conference - Digest of Technical Papers.