Predicting communication protocol performance on superscalar architectures using instruction dependency

Increasing diversity in telecommunication workloads leads to greater complexity in communication protocols. This occurs as channel bandwidth rapidly increases. These factors result in larger computational loads for network processors that are increasingly turning to high performance microprocessor designs. This paper presents an analytical method for estimating the performance of instruction level parallel (ILP) processors executing network protocol processing applications. Instruction dependency information extracted while executing an application is used to calculate upper and lower bounds for throughput, measured in instructions per cycle (IPC). Results using UDP/TCP/IP applications show that the simulated IPC values fall between the analytically derived upper and lower bounds, validating the model. The analytical method is much less expensive than cycle-accurate simulation, but reveals similar throughput performance predictions. This allows the architectural design space for network superscalar processors to be explored more rapidly and comprehensively, to reveal the maximum IPC that is possible for a given application workload and the available hardware resources.

[1]  Sang Bang Choi,et al.  The Effect of Instruction Window on the Performance of Superscalar Processors(Special Section of Papers Selected from ITC-CSCC'97) , 1998 .

[2]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[3]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[4]  Norman P. Jouppi,et al.  The Nonuniform Distribution of Instruction-Level and Machine Parallelism and Its Effect on Performance , 1989, IEEE Trans. Computers.

[5]  Tarek M. Taha,et al.  An Instruction Throughput Model of Superscalar Processors , 2008, IEEE Transactions on Computers.

[6]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[7]  G. Lauterbach Vying for the lead in high-performance processors , 1999, Computer.

[8]  D. Scott Wills,et al.  Reducing operand communication overhead using instruction clustering for multimedia applications , 2005, Seventh IEEE International Symposium on Multimedia (ISM'05).

[9]  Monica S. Lam,et al.  Instruction Scheduling for Superscalar Architectures , 1990 .

[10]  Linda M. Wills,et al.  Empirical analysis of operand usage and transport in multimedia applications , 2004 .

[11]  Pradeep K. Dubey,et al.  Dynamic Trace Analysis for Analytic Modeling of Suberscalar Performance , 1994, Perform. Evaluation.

[12]  Michael J. Flynn,et al.  Instruction Window Size Trade-Offs and Characterization of Program Parallelism , 1994, IEEE Trans. Computers.

[13]  Cecil O. Alford,et al.  Relating Communication Protocol Processin to Processor Performance , 1996, PDPTA.

[14]  J A Fisher,et al.  Instruction-Level Parallel Processing , 1991, Science.

[15]  Sang Bang Choi,et al.  System Performance Analyses of Out-of-Order Superscalar Processors Using Analytical Method (Special Section of Papers Selected from ITC-CSCC '98) , 1999 .

[16]  C. L. Liu,et al.  Introduction to Combinatorial Mathematics. , 1971 .