FashionViL: Fashion-Focused Vision-and-Language Representation Learning