Implementing sparse BLAS primitives on concurrent/vector processors: a case study