Error detection of oceanic observation data using sequential labeling

Globally-covered ocean monitoring system Argo with more than 3,600 small and light-weight drifting buoys is always working for oceanic temperature and salinity measurement. The accumulated big ocean observation data helps many studies such as investigation into climate change mechanism. Although human experts visually confirm and revise quality control (QC) labels, it is difficult to regularize the quality of the ocean observation data of all over the world. Therefore, this paper proposes a method for error detection in Argo observation data to realize an automatic QC with high accuracy equal to human experts. The target dataset is imbalanced data and requires consideration of sequence of both features and quality labels for accurate labeling in each depth. The proposed method utilizes Conditional Random Field (CRF) to assign quality labels for observed temperature and salinity values, and adopts Support Vector Machine (SVM) to design a feature function for numerical attributes. Experimental results have shown that the proposed method showed better accuracy of QC label assignments than those of point-wise prediction method using SVM and the actually operated system in Argo project.