The efficiency of classification in imperfect databases: comparing kNN and correlation clustering

In every day life it is usually too expensive or simply not possible to determine every attribute of an object. So for example, it would not come as a surprise that hospitals only conduct a CT or MRI in given circumstances. Hence it is quite a common problem at such classifications, that data is lacking for both the objects forming the existing classes and the objects needing to be sorted. There exist different algorithms that try to make up the missing information and do the calculations for the classification based on these, and there are also some algorithms that only use the existing data and nothing else. In this article we compare two algorithms, the well-known kNN and one based on correlation clustering. Whilst the former only extrapolates based on accessible data, the latter only uses existing data. Keywords: classification, kNN, missing data, correlation clustering MSC: 68W20, 62H30, 91C20