On Characterizing the Data Access Complexity of Programs

Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of developing lower bounds for data access complexity has been modeled using the formalism of Hong and Kung's red/blue pebble game for computational directed acyclic graphs (CDAGs). However, previously developed approaches to lower bounds analysis for the red/blue pebble game are very limited in effectiveness when applied to CDAGs of real programs, with computations comprised of multiple sub-computations with differing DAG structure. We address this problem by developing an approach for effectively composing lower bounds based on graph decomposition. We also develop a static analysis algorithm to derive the asymptotic data-access lower bounds of programs, as a function of the problem size and cache size.

[1]  Pinar Heggernes,et al.  Graph-Theoretic Concepts in Computer Science , 2016, Lecture Notes in Computer Science.

[2]  Stefán Ingi Valdimarsson The Brascamp–Lieb Polyhedron , 2010, Canadian Journal of Mathematics.

[3]  James Demmel,et al.  Minimizing Communication in Numerical Linear Algebra , 2009, SIAM J. Matrix Anal. Appl..

[4]  P. Feautrier Parametric integer programming , 1988 .

[5]  Mohammad Zubair,et al.  Cache-optimal algorithms for option pricing , 2010, TOMS.

[6]  James Demmel,et al.  Communication lower bounds and optimal algorithms for programs that reference arrays - Part 1 , 2013, ArXiv.

[7]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[8]  ToledoSivan,et al.  Communication lower bounds for distributed-memory matrix multiplication , 2004 .

[9]  Leslie G. Valiant,et al.  A bridging model for multi-core computing , 2008, J. Comput. Syst. Sci..

[10]  Franco P. Preparata,et al.  Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part II, Lower Bounds , 1999, Theory of Computing Systems.

[11]  James Demmel,et al.  Minimizing Communication in All-Pairs Shortest Paths , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.

[12]  T. Tao,et al.  Finite bounds for Hölder-Brascamp-Lieb multilinear inequalities , 2005, math/0505691.

[13]  J. Ramanujam,et al.  On characterizing the data movement complexity of computational DAGs for parallel execution , 2014, SPAA.

[14]  Desh Ranjan,et al.  Upper and lower I/O bounds for pebbling r-pyramids , 2010, J. Discrete Algorithms.

[15]  James Demmel,et al.  Communication-optimal Parallel and Sequential QR and LU Factorizations , 2008, SIAM J. Sci. Comput..

[16]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[17]  Gianfranco Bilardi,et al.  A Lower Bound Technique for Communication on BSP with Application to the FFT , 2012, Euro-Par.

[18]  DemmelJames,et al.  Graph expansion and communication costs of fast matrix multiplication , 2013 .

[19]  Sanjay V. Rajopadhye,et al.  The Z-polyhedral model , 2007, PPOPP.

[20]  Esslli Site,et al.  Models of Computation , 2012 .

[21]  A. I. Barvinok,et al.  Computing the Ehrhart polynomial of a convex lattice polytope , 1994, Discret. Comput. Geom..

[22]  F. P. Preparata,et al.  Processor—Time Tradeoffs under Bounded-Speed Message Propagation: Part I, Upper Bounds , 1995, Theory of Computing Systems.

[23]  Mohammad Zubair,et al.  A unified model for multicore architectures , 2008, IFMT '08.

[24]  H. Whitney,et al.  An inequality related to the isoperimetric inequality , 1949 .

[25]  Jack Dongarra,et al.  High Performance Computing for Computational Science , 2003 .

[26]  James Demmel,et al.  Brief announcement: strong scaling of matrix multiplication algorithms and memory-independent communication lower bounds , 2012, SPAA '12.

[27]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[28]  John Shalf,et al.  Exascale Computing Technology Challenges , 2010, VECPAR.

[29]  Michele Scquizzato,et al.  Communication Lower Bounds for Distributed-Memory Computations , 2013, STACS.

[30]  Samuel H. Fuller,et al.  The Future of Computing Performance: Game Over or Next Level? , 2014 .

[31]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[32]  Philippe Clauss,et al.  Profiling Data-Dependence to Assist Parallelization: Framework, Scope, and Optimization , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.

[33]  Stephen A. Cook,et al.  An observation on time-storage trade off , 1973, J. Comput. Syst. Sci..

[34]  Andrea Pietracaprina,et al.  On the Space and Access Complexity of Computation DAGs , 2000, WG.

[35]  Desh Ranjan,et al.  Strong I/O Lower Bounds for Binomial and FFT Computation Graphs , 2011, COCOON.

[36]  Roberto Bruni,et al.  Models of Computation , 2017, Texts in Theoretical Computer Science. An EATCS Series.

[37]  Desh Ranjan,et al.  Vertex isoperimetric parameter of a Computation Graph , 2012, Int. J. Found. Comput. Sci..

[38]  Telecommunications Board The Future of Computing Performance: Game Over or Next Level? , 2011 .

[39]  Gianfranco Bilardi,et al.  A Characterization of Temporal Locality and Its Portability across Memory Hierarchies , 2001, ICALP.

[40]  Samuel H. Fuller,et al.  Computing Performance: Game Over or Next Level? , 2011, Computer.

[41]  John E. Savage Extending the Hong-Kung Model to Memory Hierarchies , 1995, COCOON.