Toward Understanding Generalization of Over-parameterized Deep ReLU network trained with SGD in Student-teacher Setting