Sequential and Parallel Code Sections are Different: they may require different Processors

Amdhal's law [1] says that, we cannot go faster than the serial section of the program though we might have infinite processing resource. Therefore, to obtain optimal performance in many-core era, we should exploit both Thread Level Parallelism (TLP) and Instruction Level Parallelism (ILP) : TLP by extracting more parallelism and ILP by making sequential cores faster. An application parallelized using shared memory model application can be divided into: 1. Serial section that runs only in one core and 2. Parallel sections that run simultaneously in multiple cores. In this paper, we characterize the inherent program behavior of the serial and parallel sections to find the difference between them in currently available multi-threaded applications. Our analysis shows that, the micro-architectural resource requirements of both these sections are different, thereby affirming that heterogeneous cores with few complex cores and many small cores will benefit most applications in many-core era.

[1]  Zhe Wang,et al.  Studying microarchitectural structures with object code reordering , 2009, WBIA '09.

[2]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[3]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[4]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[5]  Dominique Lavenier,et al.  PLAST: parallel local alignment search tool for database comparison , 2009, BMC Bioinformatics.

[6]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[8]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[9]  André Seznec,et al.  A new case for the TAGE branch predictor , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[10]  Andrew A. Chien,et al.  The future of microprocessors , 2011, Commun. ACM.

[11]  Lieven Eeckhout,et al.  Microarchitecture-Independent Workload Characterization , 2007, IEEE Micro.

[12]  Scott B. Baden,et al.  Redefining the Role of the CPU in the Era of CPU-GPU Integration , 2012, IEEE Micro.

[13]  André Seznec,et al.  Impact of Serial Scaling of Multi-threaded Programs in Many-Core Era , 2014, 2014 International Symposium on Computer Architecture and High Performance Computing Workshop.

[14]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .