Is Clustering Time-Series Water Depth Useful? An Exploratory Study for Flooding Detection in Urban Drainage Systems

As sensor measurements emerge in urban water systems, data-driven unsupervised machine learning algorithms have drawn tremendous interest in event detection and hydraulic water level and flow prediction recently. However, most of them are applied in water distribution systems and few studies consider using unsupervised cluster analysis to group the time-series hydraulic-hydrologic data in stormwater urban drainage systems. To improve the understanding of how cluster analysis contributes to flooding location detection, this study compared the performance of K-means clustering, agglomerative clustering, and spectral clustering in uncovering time-series water depth dissimilarity. In this work, the water depth datasets are simulated by an urban drainage model and then formatted for a clustering problem. Three standard performance evaluation metrics, namely the silhouette coefficient index, Calinski–Harabasz index, and Davies–Bouldin index are employed to assess the clustering performance in flooding detection under various storms. The results show that silhouette coefficient index and Davies–Bouldin index are more suitable for assessing the performance of K-means and agglomerative clustering, while the Calinski–Harabasz index only works for spectral clustering, indicating these clustering algorithms are metric-dependent flooding indicators. The results also reveal that the agglomerative clustering performs better in detecting short-duration events while K-means and spectral clustering behave better in detecting long-duration floods. The findings of these investigations can be employed in urban stormwater flood detection at the specific junction-level sites by using the occurrence of anomalous changes in water level of correlated clusters as flood early warning for the local neighborhoods.

[1]  Alessandro Laio,et al.  Clustering by fast search and find of density peaks , 2014, Science.

[2]  T. Chang,et al.  Inundation simulation for urban drainage basin with storm sewer system , 2000 .

[3]  Lu Xing,et al.  Unsteady pressure patterns discovery from high-frequency sensing in water distribution systems. , 2019, Water research.

[4]  Donald W. Bouldin,et al.  A Cluster Separation Measure , 1979, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[6]  Eamonn J. Keogh,et al.  Clustering of time-series subsequences is meaningless: implications for previous and future research , 2004, Knowledge and Information Systems.

[7]  Luca Carniello,et al.  Multipurpose Use of Artificial Channel Networks for Flood Risk Reduction: The Case of the Waterway Padova–Venice (Italy) , 2020, Water.

[8]  Md. Jalil Piran,et al.  Survey of computational intelligence as basis to big flood management: challenges, research directions and future work , 2018 .

[9]  Hidetoshi Shimodaira,et al.  Pvclust: an R package for assessing the uncertainty in hierarchical clustering , 2006, Bioinform..

[10]  T. Palmer,et al.  Stochastic representation of model uncertainties in the ECMWF ensemble prediction system , 2007 .

[11]  Avi Ostfeld,et al.  Evolutionary algorithms and other metaheuristics in water resources: Current status, research challenges and future directions , 2014, Environ. Model. Softw..

[12]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[13]  Andrea Rinaldo,et al.  GEOMORPHOLOGICAL THEORY OF THE HYDROLOGICAL RESPONSE , 1996 .

[14]  Meng Li,et al.  Automatic setting of urban drainage pipe monitoring points based on scenario simulation and fuzzy clustering , 2018, Urban Water Journal.

[15]  Xiangyu Li,et al.  Davies Bouldin Index based hierarchical initialization K-means , 2017, Intell. Data Anal..

[16]  Pao-Shan Yu,et al.  Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting , 2017 .

[17]  Abhiram Mullapudi,et al.  Deep reinforcement learning for the real time control of stormwater systems , 2020 .

[18]  Luca Carniello,et al.  Simplified methods for real-time prediction of storm surge uncertainty: The city of Venice case study , 2014 .

[19]  Kwok-wing Chau,et al.  Flood Prediction Using Machine Learning Models: Literature Review , 2018, Water.

[20]  Jiada Li,et al.  Rethinking the Framework of Smart Water System: A Review , 2020, Water.

[21]  Branko Kerkez,et al.  Are all data useful? Inferring causality to predict flows across sewer and drainage systems using directed information and boosted regression trees. , 2018, Water research.

[22]  Mustafa Neamah Jebur,et al.  Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS , 2014 .

[23]  Zaher Mundher Yaseen,et al.  An enhanced extreme learning machine model for river flow forecasting: State-of-the-art, practical applications in water resource engineering area and future research direction , 2019, Journal of Hydrology.

[24]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.

[25]  Xiaohong Chen,et al.  Flood hazard risk assessment model based on random forest , 2015 .

[26]  Jiada Li A data-driven improved fuzzy logic control optimization-simulation tool for reducing flooding volume at downstream urban drainage systems. , 2020, The Science of the total environment.

[27]  Wolfgang Rauch,et al.  Optimizing Small Hydropower Systems in Water Distribution Systems Based on Long-Time-Series Simulation and Future Scenarios , 2015 .

[28]  Antonella Sanna,et al.  A data assimilation procedure for operational prediction of storm surge in the northern Adriatic Sea , 2006 .

[29]  Shuming Liu,et al.  Burst Detection by Analyzing Shape Similarity of Time Series Subsequences in District Metering Areas , 2020 .

[30]  Shahaboddin Shamshirband,et al.  Coupling a firefly algorithm with support vector regression to predict evaporation in northern Iran , 2018 .

[31]  Patricio A. Vela,et al.  A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm , 2012, Expert Syst. Appl..

[32]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[33]  G. Freni,et al.  Optimal water quality sensor positioning in urban drainage systems for illicit intrusion identification , 2019, Journal of Hydroinformatics.

[34]  Tao Tao,et al.  Construction Cost-Based Effectiveness Analysis of Green and Grey Infrastructure in Controlling Flood Inundation: A Case Study , 2019, Journal of Water Management Modeling.

[35]  William D. Shannon,et al.  11 Cluster Analysis , 2007 .

[36]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[37]  Kevin Horsburgh,et al.  Development and evaluation of an ensemble forecasting system for coastal storm surges , 2010 .

[38]  Age K. Smilde,et al.  Principal Component Analysis , 2003, Encyclopedia of Machine Learning.

[39]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[40]  M. Borga,et al.  Flash flood warning based on rainfall thresholds and soil moisture conditions: An assessment for gauged and ungauged basins , 2008 .

[41]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[42]  D Butler,et al.  Clustering analysis of water distribution systems: identifying critical components and community impacts. , 2014, Water science and technology : a journal of the International Association on Water Pollution Research.

[43]  Yingjie Tian,et al.  A Comprehensive Survey of Clustering Algorithms , 2015, Annals of Data Science.

[44]  Aaron Poresky,et al.  Smarter Stormwater Systems. , 2016, Environmental science & technology.

[45]  Moh'd Belal Al Zoubi,et al.  An Efficient Approach for Computing Silhouette Coefficients , 2008 .

[46]  C. Shu,et al.  Regional flood frequency analysis at ungauged sites using the adaptive neuro-fuzzy inference system , 2008 .

[47]  Ying Wah Teh,et al.  Time-series clustering - A decade review , 2015, Inf. Syst..

[48]  Piero Lionello,et al.  High resolution climate projection of storm surge at the Venetian coast , 2013 .

[49]  Hamid Darabi,et al.  River suspended sediment modelling using the CART model: A comparative study of machine learning techniques. , 2018, The Science of the total environment.

[50]  Brandon P. Wong,et al.  Adaptive measurements of urban runoff quality , 2016 .

[51]  P. Danielsson Euclidean distance mapping , 1980 .

[52]  Avi Ostfeld,et al.  Data-driven modelling: some past experiences and new approaches , 2008 .

[53]  Luca Carniello,et al.  Optimal floodgate operation for river flood management: The case study of Padova (Italy) , 2020, Journal of Hydrology: Regional Studies.

[54]  P. Willems,et al.  A Methodology for the Design of RTC Strategies for Combined Sewer Networks , 2018, Water.

[55]  Kwok-wing Chau,et al.  Design of water distribution systems using an intelligent simple benchmarking algorithm with respect to cost optimization and computational efficiency , 2019, Water Supply.

[56]  Xue Wu,et al.  Burst detection in district metering areas using a data driven clustering algorithm. , 2016, Water research.

[57]  Mudasser Iqbal,et al.  Automated sub-zoning of water distribution systems , 2014, Environ. Model. Softw..

[58]  Dan Koo,et al.  Towards Sustainable Water Supply: Schematic Development of Big Data Collection Using Internet of Things (IoT) , 2015 .

[59]  Marcelo Horacio Garcia,et al.  Innovative modeling framework for combined sewer overflows prediction , 2017 .

[60]  B. Russo,et al.  Real-time urban flood forecasting and modelling – a state of the art , 2013 .

[61]  Ming Ye,et al.  Using cluster analysis for understanding spatial and temporal patterns and controlling factors of groundwater geochemistry in a regional aquifer , 2020 .

[62]  Li-Chiu Chang,et al.  Clustering-based hybrid inundation model for forecasting flood inundation depths , 2010 .

[63]  Davar Khalili,et al.  Daily Outflow Prediction by Multi Layer Perceptron with Logistic Sigmoid and Tangent Sigmoid Activation Functions , 2010 .

[64]  R. L. Thorndike Who belongs in the family? , 1953 .

[65]  ChangKyoo Yoo,et al.  Determination of key sensor locations for non-point pollutant sources management in sewer network , 2013, Korean Journal of Chemical Engineering.

[66]  Avi Ostfeld,et al.  Topological clustering for water distribution systems analysis , 2011, Environ. Model. Softw..

[67]  E G Knox Epidemiology of prenatal infections: an extension of the congenital rubella model. , 1983, Statistics in medicine.

[68]  M. B. Abbott,et al.  Twenty-Five Years of Hydroinformatics , 2017 .

[69]  Mohamed M. Morsy,et al.  Forecasting Groundwater Table in a Flood Prone Coastal City with Long Short-term Memory and Recurrent Neural Networks , 2019, Water.

[70]  M. Forina,et al.  Clustering with dendrograms on interpretation variables , 2002 .