A SnuCL implementation of the LINPACK benchmark on clusters with multi-GPU nodes