Comparing the diagnostic performance of methods used in a full-factorial design multi-reader multi-case studies