Semi-Supervised Vector Quantization for proximity data

Semi-supervised learning (SSL) is focused on learning from labeled and unlabeled data by incorporating structural and statistical in- formation of the available unlabeled data. The amount of data is dra- matically increasing, but few of them are fully labeled, due to cost and time constraints. This is even more challenging for non-vectorial, proxim- ity data, given by pairwise proximity values. Only few methods provide SSL for this data, limited to positive-semi-definite (psd) data. They also lack interpretable models, which is a relevant aspect in life-sciences where most of these data are found. This paper provides a prototype based SSL approach for proximity data.