Comparison on the performance of Lattice Boltzmann method solver executed on multi-node GPU cluster by multi-dimensional domain decompositions