An approach for selecting small sets of diagnosis codes with high prediction performance in large datasets of electronic medical records