Optimizing massively parallel sparse matrix computing on ARM many-core processor