Memory optimization for a parallel sorting hardware architecture