Machine learning using synthetic and real data: Similarity of evaluation metrics for different healthcare datasets and for different algorithms