T-Star ( T * ) : An x 86-64 ISA Extension to support thread execution on many cores

The number of cores per chip keeps increasing in order to improve performance while controlling the power. According to semiconductor roadmaps, future computing systems will reach the scale of 1 Tera devices in a single package and therefore manycore (e.g. 1000 or more) will be the norm. Here, we describe an ISE (ISA Extension) that we are experimenting in the x86-64 ISA in order to provide an efficient, fast support for fine-grained threads. The new ISE enables a different execution model based on the availability of data and opens the doors for many architectural optimizations not possible in current cores. We also describe the architectural support related to the T* extension

[1]  Eduard Ayguadé,et al.  Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[2]  Roberto Giorgi,et al.  TERAFLUX: Exploiting Tera-device Computing Challenges , 2011, FET.

[3]  Roberto Giorgi,et al.  DTA-C: A Decoupled multi-Threaded Architecture for CMP Systems , 2007, 19th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'07).

[4]  Roberto Giorgi,et al.  Exploiting DMA to enable non-blocking execution in Decoupled Threaded Architecture , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[5]  Rosa M. Badia,et al.  Exploiting dataflow parallelism in Teradevice Computing , 2011 .

[6]  Krishna M. Kavi,et al.  Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.