An Optimal Time-Power Tradeoff for Sorting on a Mesh-Connected Computer with On-Chip Optics

Energy consumption has become a critical factor constraining the design of massively parallel computers, necessitating the development of new models and energy-efficient algorithms. The primary component of on-chip energy consumption is data movement, and the mesh computer is a natural model of this, explicitly taking distance into account. Unfortunately the dark silicon problem increasingly constrains the number of bits which can be moved simultaneously. For sorting, standard mesh algorithms minimize time and total data movement, and hence constraining the mesh to use only half its processors at any instant must double the time. It is anticipated that on-chip optics will be used to minimize the energy needed to move bits, but they have constraints on their layout. In an abstract model, we show that a pyramidal layout and a new power-aware algorithm allows one to sort with only a square root increase in time as the fraction of processors simultaneously powered decreases. Furthermore, this layout is shown to be optimal in terms of the time-power tradeoff required for sorting. Previous algorithms assumed fully powered systems, hence pyramid sorting was of no interest since when fully powered they are no faster than the base mesh. Our results show asymptotic theoretical limits of computation and energy usage on a model which takes physical constraints and developing interconnection technology into account.Â

[1]  Christopher Batten,et al.  Silicon-photonic clos networks for global on-chip communication , 2009, 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip.

[2]  George Kurian,et al.  ATAC: A 1000-core cache-coherent processor with on-chip optical network , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[3]  Steven Swanson,et al.  Conservation cores: reducing the energy of mature computations , 2010, ASPLOS XV.

[4]  Adi Shamir,et al.  An optimal sorting algorithm for mesh connected computers , 1986, STOC '86.

[5]  Russ Miller,et al.  Parallel algorithms for regular architectures - meshes and pyramids , 1996 .

[6]  Yiming Ma,et al.  Two Nearly Optimal Sorting Algorithms for Mesh-Connected Processor Arrays Using Shear-Sort , 1989, J. Parallel Distributed Comput..

[7]  Payman Zarkesh-Ha,et al.  Interconnect opportunities for gigascale integration , 2002, IBM J. Res. Dev..

[8]  Miltos D. Grammatikakis,et al.  Packet Routing in Fixed-Connection Networks: A Survey , 1998, J. Parallel Distributed Comput..

[9]  Pravin M. Vaidya,et al.  AnO(n logn) algorithm for the all-nearest-neighbors Problem , 1989, Discret. Comput. Geom..

[10]  Quentin F. Stout Minimizing peak energy on mesh-connected systems , 2006, SPAA '06.

[11]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[12]  Frédéric Gaffiot,et al.  On-Chip Optical Interconnect for Low-Power , 2004, Ultra Low-Power Electronics and Design.

[13]  Karthikeyan Sankaralingam,et al.  Dark silicon and the end of multicore scaling , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[14]  Quentin F. Stout Tree-Based Graph Algorithms for Some Parallel Computers , 1985, ICPP.

[15]  Ronald I. Greenberg,et al.  The Fat-Pyramid and Universal Parallel Computation Independent of Wire Delay , 1994, IEEE Trans. Computers.

[16]  Daniel J. Kleitman,et al.  The crossing number of K5,n , 1970 .

[17]  F. Thomas Leighton,et al.  Complexity Issues in VLSI , 1983 .

[18]  Stephen Wolfram,et al.  A New Kind of Science , 2003, Artificial Life.

[19]  E. LESTER SMITH,et al.  AND OTHERS , 2005 .

[20]  Russ Miller,et al.  Mesh Computer Algorithms for Computational Geometry , 1989, IEEE Trans. Computers.

[21]  Isaac D. Scherson,et al.  Parallel Sorting in Two-Dimensional VLSI Models of Computation , 1989, IEEE Trans. Computers.

[22]  S. Griffis EDITOR , 1997, Journal of Navigation.

[23]  Mikhail J. Atallah,et al.  Graph Problems on a Mesh-Connected Processor Array , 1984, JACM.

[24]  Alan L. Cox,et al.  OpenMP for Networks of SMPs , 2000, J. Parallel Distributed Comput..

[25]  Mikhail J. Atallah,et al.  Solving tree problems on a mesh-connected processor array , 1985, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[26]  Sunao Torii,et al.  On-Chip Optical Interconnect , 2009, Proceedings of the IEEE.

[27]  Nobuaki Fujii,et al.  On three-dimensional layout of pyramid networks , 2002, Asia-Pacific Conference on Circuits and Systems.

[28]  Sartaj Sahni,et al.  Bitonic Sort on a Mesh-Connected Parallel Computer , 1979, IEEE Transactions on Computers.

[29]  J. Michel,et al.  Ge-on-Si laser operating at room temperature. , 2010, Optics letters.

[30]  L.A. Coldren Silicon photonics for next generation computing systems , 2008, 2008 34th European Conference on Optical Communication.

[31]  Patrick J. Poon Energy-Efficient Algorithms on Mesh-Connected Systems with Additional Communication Links. , 2013 .

[32]  J. Schwartz,et al.  Theory of Self-Reproducing Automata , 1967 .

[33]  Gene Eu Jan,et al.  On the Array Embeddings and Layouts of Quadtrees and Pyramids , 2004, J. Inf. Sci. Eng..

[34]  Tommaso Toffoli,et al.  Cellular automata machines - a new environment for modeling , 1987, MIT Press series in scientific computation.

[35]  Manoj Kumar,et al.  An Efficient Implementation of Batcher's Odd-Even Merge Algorithm and Its Application in Parallel Sorting Schemes , 1983, IEEE Transactions on Computers.

[36]  Gilbert Hendry,et al.  Architectural Exploration of Chip-Scale Photonic Interconnection Network Designs Using Physical-Layer Analysis , 2010, Journal of Lightwave Technology.

[37]  Yiming Ma,et al.  The distance bound for sorting on mesh-connected processor arrays is tight , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).

[38]  H. T. Kung,et al.  Sorting on a mesh-connected parallel computer , 1977, CACM.