The Handling of Missing Values in Medical Domains with Respect to Pattern Mining Algorithms

Missing values are a wide spread problem in analyzing large data sets. In the medical domain they are unavoidable and complete analyzing methods fail here. In the paper we give an overview of kinds of missingness and common methods to handle missing values in machine learning algorithms. We introduce the Charite Query Language Toolkit which was developed to find out similar patterns in patient data records with respect to post-kidney-transplant patients. The toolkit uses available case analysis methods combined with a preprocessing of missing values as a compromise of simplicity and functionality.