Sparse Matrix Vector Multiplication on Distributed Architectures: Lower Bounds and Average Complexity Results

In this paper we consider the problem of computing y = Ax where A is an n x n sparse matrix with Θ(n) nonzero elements. We prove that, under reasonable assumptions, on a local memory machine with p processors this computation requires Ω((n/p) log p) time. We also study the average complexity of this problem: we prove that for an important class of algorithms the computation of y = Ax requires Ω((n/p) log p) time with probability greater than 12.