Comparison of Machine Learning Algorithms for Retrieval of Water Quality Indicators in Case-II Waters: A Case Study of Hong Kong

Anthropogenic activities in coastal regions are endangering marine ecosystems. Coastal waters classified as case-II waters are especially complex due to the presence of different constituents. Recent advances in remote sensing technology have enabled to capture the spatiotemporal variability of the constituents in coastal waters. The present study evaluates the potential of remote sensing using machine learning techniques, for improving water quality estimation over the coastal waters of Hong Kong. Concentrations of suspended solids (SS), chlorophyll-a (Chl-a), and turbidity were estimated with several machine learning techniques including Artificial Neural Network (ANN), Random Forest (RF), Cubist regression (CB), and Support Vector Regression (SVR). Landsat (5,7,8) reflectance data were compared with in-situ reflectance data to evaluate the performance of machine learning models. The highest accuracies of the water-quality indicators were achieved by ANN for both, in-situ reflectance data (89%-Chl-a, 93%-SS, and 82%-turbidity) and satellite data (91%-Chl-a, 92%-SS, and 85%-turbidity. The water quality parameters retrieved by the ANN model was further compared to those retrieved by “standard Case-2 Regional/Coast Colour” (C2RCC) processing chain model C2RCC-Nets. Root mean square error (RMSE) for estimating SS and Chl-a was 3.3 mg/L and 2.7 µg/L respectively using ANN, whereas RMSEs were 12.7 mg/L and 12.9 µg/L for suspended particulate matter (SPM) and Chl-a concentrations respectively when C2RCC was applied on Landsat-8 data. Relative variable importance was also conducted to investigate the consistency between in-situ reflectance data and satellite data, and results show that both datasets are similar. The red band (wavelength ≈ 0.665 µm) and the product of red and green band (wavelength ≈ 0.560 µm) were influential inputs in both reflectance data sets for estimating SS and turbidity, and the ratio between red and blue band (wavelength ≈ 0.490 µm) as well as the ratio between infrared (wavelength ≈ 0.865 µm) and blue band and green band proved to be more useful for the estimation of Chl-a concentration, due to their sensitivity to high turbidity in the coastal waters. The results indicate that the NN based machine learning approaches perform better and thus, can be used for improved water quality monitoring with satellite data in optically complex coastal waters.

[1]  I-I Lin,et al.  Application of Spectral Signatures and Colour Ratios to Estimate Chlorophyll in Singapore's Coastal Waters , 2002 .

[2]  Janet E. Nichol,et al.  Combining Landsat TM/ETM+ and HJ-1 A/B CCD Sensors for Monitoring Coastal Water Quality in Hong Kong , 2015, IEEE Geoscience and Remote Sensing Letters.

[3]  Xiaoling Chen,et al.  Integration of multi-source data for water quality classification in the Pearl River estuary and its adjacent coastal waters of Hong Kong , 2004 .

[4]  R. Brazier,et al.  Understanding the influence of suspended solids on water quality and aquatic biota. , 2008, Water research.

[5]  R. Nicholls,et al.  Future Coastal Population Growth and Exposure to Sea-Level Rise and Coastal Flooding - A Global Assessment , 2015, PloS one.

[6]  S. K. McFeeters The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features , 1996 .

[7]  W. Gregg,et al.  Global and regional evaluation of the SeaWiFS chlorophyll data set , 2004 .

[8]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[9]  Assefa M. Melesse,et al.  A Comprehensive Review on Water Quality Parameters Estimation Using Remote Sensing Techniques , 2016, Sensors.

[10]  安藤 寛,et al.  Cross-Validation , 1952, Encyclopedia of Machine Learning and Data Mining.

[11]  Jian Li,et al.  Assessment of Total Suspended Sediment Distribution under Varying Tidal Conditions in Deep Bay: Initial Results from HJ-1A/1B Satellite CCD Images , 2014, Remote. Sens..

[12]  Richard L. Miller,et al.  Using MODIS Terra 250 m imagery to map concentrations of total suspended matter in coastal waters , 2004 .

[13]  H. B. Menon,et al.  Assessment of MODIS-Aqua chlorophyll-a algorithms in coastal and shelf waters of the eastern Arabian Sea , 2013 .

[14]  Roland Doerffer,et al.  Atmospheric correction algorithm for MERIS above case‐2 waters , 2007 .

[15]  Cristiano Ballabio,et al.  Spatial prediction of soil properties in temperate mountain regions using support vector regression , 2009 .

[16]  J. Burkholder,et al.  Harmful algal blooms and eutrophication: Nutrient sources, composition, and consequences , 2002 .

[17]  Yong Liu,et al.  Application of Multivariate Statistical Methods to Water Quality Assessment of the Watercourses in Northwestern New Territories, Hong Kong , 2006, Environmental monitoring and assessment.

[18]  Carsten Brockmann,et al.  Evolution of the C2RCC Neural Network for Sentinel 2 and 3 for the Retrieval of Ocean Colour Products in Normal and Extreme Optically Complex Waters , 2016 .

[19]  Paul A. Fishwick,et al.  Time series forecasting using neural networks vs. Box- Jenkins methodology , 1991, Simul..

[20]  K. P. Singh,et al.  Support vector machines in water quality management. , 2011, Analytica chimica acta.

[21]  David J. Mulla,et al.  An Application of Landsat-5TM Image Data for Water Quality Mapping in Lake Beysehir, Turkey , 2010 .

[22]  B. Markham,et al.  Summary of Current Radiometric Calibration Coefficients for Landsat MSS, TM, ETM+, and EO-1 ALI Sensors , 2009 .

[23]  Gabriel Navarro,et al.  Evaluation of the First Year of Operational Sentinel-2A Data for Retrieval of Suspended Solids in Medium- to High-Turbidity Waters , 2018, Remote. Sens..

[24]  Cardona Alzate,et al.  Predicción y selección de variables con bosques aleatorios en presencia de variables correlacionadas , 2020 .

[25]  Yunpeng Wang,et al.  Investigating the Impacts of Landuse-landcover (LULC) Change in the Pearl River Delta Region on Water Quality in the Pearl River Estuary and Hong Kong's Coast , 2009, Remote. Sens..

[26]  B. Matsushita,et al.  A hybrid algorithm for estimating the chlorophyll-a concentration across different trophic states in Asian inland waters , 2015 .

[27]  J. Im,et al.  Machine learning approaches for forest classification and change analysis using multi-temporal Landsat TM images over Huntington Wildlife Forest , 2013 .

[28]  V. Caselles,et al.  Integrated satellite data fusion and mining for monitoring lake water quality status of the Albufera de Valencia in Spain. , 2015, Journal of environmental management.

[29]  A. Sadeghi,et al.  Improvement to the PhytoDOAS method for identification of coccolithophores using hyper-spectral satellite data , 2012 .

[30]  K. Ruddick,et al.  Advantages of high quality SWIR bands for ocean colour processing: Examples from Landsat-8 , 2015 .

[31]  J. Nichol,et al.  MODELING WATER QUALITY USING TERRA / MODIS 500 M SATELLITE IMAGES , 2008 .

[32]  J. Ross Quinlan,et al.  Combining Instance-Based and Model-Based Learning , 1993, ICML.

[33]  Jong-Kuk Choi,et al.  GOCI, the world's first geostationary ocean color observation satellite, for the monitoring of temporal variability in coastal water turbidity , 2012 .

[34]  Saso Dzeroski,et al.  Estimating vegetation height and canopy cover from remotely sensed data with machine learning , 2010, Ecol. Informatics.

[35]  Gustau Camps-Valls,et al.  Retrieval of Case 2 Water Quality Parameters with Machine Learning , 2018, IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium.

[36]  J. Nichol,et al.  A new approach for the estimation of phytoplankton cell counts associated with algal blooms. , 2017, The Science of the total environment.

[37]  Jungho Im,et al.  Estimation of Water Quality Index for Coastal Areas in Korea Using GOCI Satellite Data Based on Machine Learning Approaches , 2016 .

[38]  Thomas Blaschke,et al.  Land cover change assessment using decision trees, support vector machines and maximum likelihood classification algorithms , 2010, Int. J. Appl. Earth Obs. Geoinformation.

[39]  Lawrence W. Harding,et al.  Toward a Predictive Understanding of Primary Productivity in a Temperate, Partially Stratified Estuary , 2002 .

[40]  Adjie Pamungkas,et al.  Development of Water Quality Parameter Retrieval Algorithms for Estimating Total Suspended Solids and Chlorophyll-A Concentration Using LANDSAT-8 Imagery at Poteran Island Water , 2015 .

[41]  Abdul Basith,et al.  Aerosol optical depth (AOD) retrieval for atmospheric correction in Landsat-8 imagery using second simulation of a satellite signal in the solar spectrum-vector (6SV) , 2019 .

[42]  Mahesh Panchal,et al.  Review on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network , 2014 .

[43]  George Sugihara,et al.  Predicting coastal algal blooms in southern California. , 2017, Ecology.

[44]  Dong Li,et al.  Use of Reflectance Ratios as a Proxy for Coastal Water Constituent Monitoring in the Pearl River Estuary , 2009, Sensors.

[45]  Gustavo Camps-Valls,et al.  Retrieval of oceanic chlorophyll concentration with relevance vector machines , 2006 .

[46]  Hans W. Paerl,et al.  Assessing and managing nutrient-enhanced eutrophication in estuarine and coastal waters: Interactive effects of human and climatic perturbations , 2006 .

[47]  Young-Heon Jo,et al.  Future Retrievals of Water Column Bio-Optical Properties using the Hyperspectral Infrared Imager (HyspIRI) , 2013, Remote. Sens..

[48]  J. Nichol,et al.  Evaluation of atmospheric correction models and Landsat surface reflectance product in an urban coastal environment , 2014 .

[49]  B. Nechad,et al.  Calibration and validation of a generic multisensor algorithm for mapping of total suspended matter in turbid waters , 2010 .

[50]  Jungho Im,et al.  ISPRS Journal of Photogrammetry and Remote Sensing , 2022 .

[51]  J. R. Quinlan Learning With Continuous Classes , 1992 .

[52]  Tim Appelhans,et al.  Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania , 2015 .

[53]  Jungho Im,et al.  Remote Sensing-based House Value Estimation Using an Optimized Regional Regression Model , 2013 .

[54]  Muhammad Bilal,et al.  Variations of transparency derived from GOCI in the Bohai Sea and the Yellow Sea. , 2018, Optics express.

[55]  Anatoly A. Gitelson,et al.  Estimation of chlorophyll-a concentration in case II waters using MODIS and MERIS data—successes and challenges , 2009 .

[56]  Chuanmin Hu,et al.  Validation of SeaWiFS chlorophyll a concentrations in the Southern Ocean: A revisit , 2006 .

[57]  Gary William Flake,et al.  Efficient SVM Regression Training with SMO , 2002, Machine Learning.

[58]  R. Nicholls,et al.  A global analysis of human settlement in coastal zones , 2003 .

[59]  S. Ha,et al.  Machine learning approaches to coastal water quality monitoring using GOCI satellite data , 2014 .