Mapping Uncharted Waters: Exploratory Analysis, Visualization, and Clustering of Oceanographic Data

In this paper we describe an interdisciplinary collaboration between researchers in machine learning and oceanography. The collaboration was formed to study the problem of open ocean biome classification. Biomes are regions on Earth with similar climate (e.g., temperature and rainfall) and vegetation structure (e.g., grasslands, coniferous forests, and deserts). To discover biomes in the open ocean, we apply leading methods in high dimensional data analysis, clustering, and visualization to oceanographic measurements culled from multiple existing databases. We compare traditional approaches, such as k-means clustering and principal component analysis, to newer approaches such as Isomap and maximum variance unfolding. Our work provides the first quantitative classification of open ocean biomes from an automated statistical analysis of multivariate data. It also provides a valuable case study in the use (and misuse) of recently developed algorithms for high dimensional data analysis.

[1]  Stephen P. Boyd,et al.  The Fastest Mixing Markov Process on a Graph and a Connection to a Maximum Variance Unfolding Problem , 2006, SIAM Rev..

[2]  S. Gorshkov,et al.  World ocean atlas , 1976 .

[3]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[4]  Kilian Q. Weinberger,et al.  Learning a kernel matrix for nonlinear dimensionality reduction , 2004, ICML.

[5]  Matthew J. Oliver,et al.  Objective global ocean biogeographic provinces , 2008 .

[6]  J. S. Godfrey,et al.  Regional Oceanography: An Introduction , 1994 .

[7]  E Mjolsness,et al.  Machine learning for science: state of the art and future prospects. , 2001, Science.

[8]  Martin Edwards,et al.  Continuous plankton records: Plankton atlas of the North Atlantic Ocean (1958-1999). II. Biogeographical charts , 2004 .

[9]  Edward Brinton,et al.  Parameters relating to the distributions of planktonic organisms, especially euphausiids in the eastern tropical Pacific , 1979 .

[10]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[11]  Timothy P. Boyer,et al.  World Ocean Atlas 2005 Volume 1: Temperature [+DVD] , 2006 .

[12]  Joshua B. Tenenbaum,et al.  Global Versus Local Methods in Nonlinear Dimensionality Reduction , 2002, NIPS.

[13]  Sydney Levitus,et al.  World ocean atlas 2005. Vol. 4, Nutrients (phosphate, nitrate, silicate) , 2006 .

[14]  S. Levitus,et al.  World ocean atlas 2013. Volume 1, Temperature , 2002 .

[15]  Le Song,et al.  Colored Maximum Variance Unfolding , 2007, NIPS.

[16]  P. Falkowski,et al.  Photosynthetic rates derived from satellite‐based chlorophyll concentration , 1997 .

[17]  F. F. Maury Geography of the Sea , 1857 .

[18]  Dale E. Ingmanson,et al.  Oceanography: An Introduction , 1979 .

[19]  S. Levitus,et al.  World ocean atlas 2009 , 2010 .

[20]  Timothy P. Boyer,et al.  World Ocean Atlas 2005, Volume 3: Dissolved Oxygen, Apparent Oxygen Utilization, and Oxygen Saturation [+DVD] , 2006 .

[21]  Kilian Q. Weinberger,et al.  Graph Laplacian Regularization for Large-Scale Semidefinite Programming , 2006, NIPS.

[22]  Christopher J. C. Burges,et al.  Geometric Methods for Feature Extraction and Dimensional Reduction , 2005 .

[23]  Wilhelm Giesbrecht,et al.  Systematik und Faunistik der pelagischen Copepoden des Golfes von Neapel und der angrenzenden Meeres-Abschnitte , 1892 .

[24]  A. Longhurst Ecological Geography of the Sea , 1998 .

[25]  Edward Brinton,et al.  The distribution of Pacific euphausiids , 1962 .