Optimizing the Wilson Dslash Kernel from Lattice QCD