Can We Gain More from Orthogonality Regularizations in Training Deep Networks?