Exploring Various Levels of Parallelism in High-Performance CRC Algorithms
暂无分享,去创建一个
Modern processors have increased the capabilities of instruction-level parallelism (ILP) and thread-level parallelism (TLP). These resources, however, typically exhibit poor utilization on conventional cyclic redundancy check (CRC) algorithms. In this paper, various levels of parallelism in high-performance CRC algorithms are investigated. The main idea of the proposed algorithms is to make full utilization of modern processors, from the perspective of both instruction-level and thread-level parallelism. First, a fine-grained algorithm executes the CRC computation in an interleaved manner, so that multiple independent data flows can be processed simultaneously. This algorithm allows instruction-level parallelism, which triples and doubles the performance of the existing slicing-by-4 and slicing-by-8 algorithms, respectively. Second, a coarse-grained algorithm can ideally deal with data in a parallel way by parallelizing a family of serial CRC generating algorithms. Therefore, this algorithm allows thread-level parallelism, which can make full use of multi-core computing capability. As a result, it achieves a speedup that is almost equal to the number of threads used. In addition, both fine-grained and coarse-grained algorithms can be applied together to achieve high throughput further. (This is an extended version of a paper that appeared at the 28th Annual International Symposium on Personal, Indoor, and Mobile Radio Communications (PIMRC) in Montreal, QC, Canada, in 2017.)