Parallel Task Developing Based on Software Pipeline in Multicore System

The development of multi-core system is ushering in the computer revolution, which strongly drives the programming towards the parallel direction. Software pipeline is one kind of efficient parallel task building. The main reason is that a system commonly includes several subtasks, which execute special functions. When a subtask finishes, some intermediate states and data should be saved for certain time, meanwhile, another subtask will continue to execute until the end of the whole pipeline. Through a real system, web proxy, we partition the whole system into several pipelines, each of which includes multiple subtasks. Based on this, we process series of optimization which can greatly improve the system performance.

[1]  Edwin Hsing-Mean Sha,et al.  Reducing Data Hazards on Multi-pipelined DSP Architecture with Loop Scheduling , 1998, J. VLSI Signal Process..

[2]  Nikolas Ioannou,et al.  Combining thread level speculation helper threads and runahead execution , 2009, ICS.

[3]  Gabriele Cecchetti,et al.  Performance evaluation of real-time schedulers for HCCA function in IEEE 802.11e wireless networks , 2008, Q2SWinet '08.

[4]  R. Edwards,et al.  Preliminary Archaeological Reconnaissance of the Lands of the University of California at Santa Cruz , 1978 .

[5]  Scott A. Mahlke,et al.  Orchestrating the execution of stream programs on multicore platforms , 2008, PLDI '08.

[6]  Scott A. Brandt,et al.  Cpu time-sharing in real-time systems , 2005 .

[7]  Mauricio Marín,et al.  High-performance priority queues for parallel crawlers , 2008, WIDM '08.

[8]  Duane Wessels Squid: The Definitive Guide , 2004 .

[9]  Babak Falsafi,et al.  Implicitly-multithreaded processors , 2003, ISCA '03.

[10]  Cédric Fournet,et al.  Cryptographically verified implementations for TLS , 2008, CCS.

[11]  Dan S. Wallach,et al.  Performance analysis of TLS Web servers , 2006, TOCS.

[12]  Jörg Henkel,et al.  Design and simulation of a pipelined decompression architecture for embedded systems , 2001, International Symposium on System Synthesis (IEEE Cat. No.01EX526).

[13]  Lorentz Jäntschi,et al.  Application of software data dependency detection algorithm in superscalar computer architecture , 2003, CompSysTech '03.

[14]  Andrew S. Tanenbaum,et al.  Modern Operating Systems: Jumpstart Sampling Edition , 2008 .

[15]  Wei Lu,et al.  Parallel XML processing by work stealing , 2007, SOCP '07.

[16]  J. Fier,et al.  Improving the Scalability of Parallel Jobs by adding Parallel Awareness to the Operating System , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[17]  Curtis L. Janssen,et al.  Multicore challenges and benefits for high performance scientific computing , 2008, Sci. Program..

[18]  Feipei Lai,et al.  Enhanced object management for high performance web proxies , 2004, SAC '04.

[19]  Christoph W. Kessler,et al.  BlockLib: a skeleton library for cell broadband engine , 2008, IWMSE '08.

[20]  Emilio L. Zapata,et al.  Efficient resolution of sparse indirections in data-parallel compilers , 1995, ICS '95.

[21]  Eric Roberts,et al.  An overview of MiniJava , 2001, SIGCSE '01.

[22]  Walter F. Tichy,et al.  Software engineering for multicore systems: an experience report , 2008, IWMSE '08.

[23]  Alan Mycroft,et al.  A lightweight in-place implementation for software thread-level speculation , 2009, SPAA '09.

[24]  Walter F. Tichy,et al.  On-the-fly race detection in multi-threaded programs , 2008, PADTAD '08.

[25]  John R. White,et al.  Linkers and Loaders , 1972, CSUR.

[26]  Viktor Vafeiadis,et al.  Proving that non-blocking algorithms don't block , 2009, POPL '09.

[27]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).