Benchmarking single-arm studies against historical controls from non-small cell lung cancer trials – an empirical analysis of bias