On bottleneck analysis in stochastic stream processing

Past improvements in clock frequencies have traditionally been obtained through technology scaling, but most recent technology nodes do not offer such benefits. Instead, parallelism has emerged as the key driver of chip-performance growth. Unfortunately, efficient simultaneous use of on-chip resources is hampered by sequential dependencies, as illustrated by Amdahl's law. Quantifying achievable parallelism in terms of provable mathematical results can help prevent futile programming efforts and guide innovation in computer architecture toward the most significant challenges. To complement Amdahl's law, we focus on stream processing and quantify performance losses due to stochastic runtimes. Using spectral theory of random matrices, we derive new analytical results and validate them by numerical simulations. These results allow us to explore unique benefits of stochasticity and show how and when they outweigh the costs for software streams.

[1]  Radu Marculescu,et al.  Workload characterization and its impact on multicore platform design , 2010, 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[2]  Moshe Sidi,et al.  On the Performance of Synchronized Programs in Distributed Networks with Random Processing Times and Transmission Delays , 1994, IEEE Trans. Parallel Distributed Syst..

[3]  Radu Marculescu,et al.  Application-specific network-on-chip architecture customization via long-range link insertion , 2005, ICCAD-2005. IEEE/ACM International Conference on Computer-Aided Design, 2005..

[4]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008, Computer.

[5]  Noureddine El Karoui Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices , 2005, math/0503109.

[6]  Petru Eles,et al.  Real-time applications with stochastic task execution times - analysis and optimisation , 2007 .

[7]  Ronald W. Wolff,et al.  The Optimal Order of Service in Tandem Queues , 1974, Oper. Res..

[8]  K. Johansson Shape Fluctuations and Random Matrices , 1999, math/9903134.

[9]  Petru Eles,et al.  Fault and energy-aware communication mapping with guaranteed latency for applications implemented on NoC , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[10]  Kirk W. Cameron The Challenges of Energy-Proportional Computing , 2010, Computer.

[11]  Quentin F. Stout,et al.  A performance analysis of local synchronization , 2006, SPAA '06.

[12]  Alberto L. Sangiovanni-Vincentelli,et al.  Period Optimization for Hard Real-time Distributed Automotive Systems , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[13]  P. Glynn,et al.  Departures from Many Queues in Series , 1991 .

[14]  Roger Sauter,et al.  Introduction to Probability and Statistics for Engineers and Scientists , 2005, Technometrics.

[15]  J. Baik,et al.  On the distribution of the length of the longest increasing subsequence of random permutations , 1998, math/9810105.

[16]  Raj Rao Nadakuditi,et al.  Fundamental Limit of Sample Generalized Eigenvalue Based Detection of Signals in Noise Using Relatively Few Signal-Bearing and Noise-Only Samples , 2009, IEEE Journal of Selected Topics in Signal Processing.

[17]  Radu Marculescu,et al.  Non-Stationary Traffic Analysis and Its Implications on Multicore Platform Design , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[19]  T. Tao,et al.  Random Matrices: Universality of Local Eigenvalue Statistics up to the Edge , 2009, 0908.1982.

[20]  Massoud Pedram,et al.  Optimizing the Power-Delay Product of a Linear Pipeline by Opportunistic Time Borrowing , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[21]  Thomas M. Cover,et al.  Elements of Information Theory: Cover/Elements of Information Theory, Second Edition , 2005 .

[22]  Alexei Borodin,et al.  Airy Kernel with Two Sets of Parameters in Directed Percolation and Random Matrix Theory , 2007, 0712.1086.

[23]  Hai Zhou,et al.  Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .

[24]  Wei Dong,et al.  Parallelizable stable explicit numerical integration for efficient circuit simulation , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[25]  Petru Eles,et al.  Fault-aware Communication Mapping for NoCs with Guaranteed Latency , 2007, International Journal of Parallel Programming.

[26]  Abhinav Vishnu,et al.  Codesign Challenges for Exascale Systems: Performance, Power, and Reliability , 2011, Computer.

[27]  I. Johnstone On the distribution of the largest eigenvalue in principal components analysis , 2001 .

[28]  Lieven Eeckhout,et al.  Trends in Server Energy Proportionality , 2011, Computer.

[29]  P. Deift Universality for mathematical and physical systems , 2006, math-ph/0603038.

[30]  Yuxiong He,et al.  An empirical evaluation of work stealing with parallelism feedback , 2006, 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06).

[31]  K. Asanovi,et al.  RAMP Blue : Implementation of a Manycore 1008 Processor FPGA System , 2008 .

[32]  S. Péché,et al.  Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices , 2004, math/0403022.

[33]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[34]  James Demmel,et al.  the Parallel Computing Landscape , 2022 .

[35]  Radu Marculescu,et al.  Cyberphysical Systems: Workload Modeling and Design Optimization , 2011, IEEE Design & Test of Computers.

[36]  Radu Marculescu,et al.  Statistical physics approaches for network-on-chip traffic characterization , 2009, CODES+ISSS '09.

[37]  Rengarajan Srinivasan,et al.  Queues in Series via Interacting Particle Systems , 1993, Math. Oper. Res..

[38]  Folkmar Bornemann,et al.  On the Numerical Evaluation of Distributions in Random Matrix Theory: A Review , 2009, 0904.1581.

[39]  Jia Zhang,et al.  Network Analysis of Scientific Workflows: A Gateway to Reuse , 2010, Computer.

[40]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[41]  Raj Rao Nadakuditi,et al.  The breakdown point of signal subspace estimation , 2010, 2010 IEEE Sensor Array and Multichannel Signal Processing Workshop.