Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training