Examining Scaling and Transfer of Language Model Architectures for Machine Translation