BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer