EXPLORING THE HUBNESS-RELATED PROPERTIES OF OCEANOGRAPHIC SENSOR DATA

In this paper we examine how the high dimensionality of oceanographic sensor data impacts the potential use of nearest-neighbor machine learning methods. In this paper, we focus on one particular consequence of the curse of dimensionality – hubness. We examine the hubness of oceanographic data and show how it can be used to visualize and detect both prototypical sensors/locations as well as ambiguous and potentially erroneous ones. We proceed to define an easy classification problem on the data, showing that the recently developed hubness-aware classification methods may help to overcome some of the hubness-related issues in the data.