FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets