Tapilot-Crossing: Benchmarking and Evolving LLMs Towards Interactive Data Analysis Agents