Generating Diverse and Natural 3D Human Motions from Text