Benchmarking weakly-supervised deep learning pipelines for whole slide classification in computational pathology