With the rapid development of face manipulation technology, it is difficult for human eyes to distinguish fake face images. On the contrary, Convolutional Neural Network (CNN) discriminators can quickly reach high accuracy in identifying fake/real face images. In this study, we explore the behavior of CNN models in distinguish fake/real faces. We find multi-scale texture difference information plays an important role in face forgery detection. Motivated by the above observation, we propose a new Multi-scale Texture Difference model coined as MTD-Net for robust face forgery detection, which leverages central difference convolution (CDC) and atrous spatial pyramid pooling (ASPP). CDC combines the pixel intensity information and the pixel gradient information to give a stationary description of texture difference information. Simultaneously, based on the ASPP, multi-scale information fusion can keep the texture features from being destroyed. Experimental results on several databases, Faceforensics++, DeeperForensics-1.0, Celeb-DF and DFDC prove that our MTD-Net outperforms existing approaches. The MTD-Net is more robust to image distortion, e.g., JPEG compression and blur, which is urgently needed in the wild world.