On the Safety of Conversational Models: Taxonomy, Dataset, and Benchmark