Long-Term Pipeline Failure Prediction Using Nonparametric Survival Analysis

Australian water infrastructure is more than a hundred years old, thus has begun to show its age through water main failures. Our work concerns approximately half a million pipelines across major Australian cities that deliver water to houses and businesses, serving over five million customers. Failures on these buried assets cause damage to properties and water supply disruptions. We applied Machine Learning techniques to find a cost-effective solution to the pipe failure problem in these Australian cities, where on average 1500 of water main failures occur each year. To achieve this objective, we construct a detailed picture and understanding of the behaviour of the water pipe network by developing a Machine Learning model to assess and predict the failure likelihood of water main breaking using historical failure records, descriptors of pipes and other environmental factors. Our results indicate that our system incorporating a nonparametric survival analysis technique called "Random Survival Forest" outperforms several popular algorithms and expert heuristics in long-term prediction. In addition, we construct a statistical inference technique to quantify the uncertainty associated with the long-term predictions.

[1]  Udaya B. Kogalur,et al.  Random Survival Forests for R , 2007 .

[2]  Jayantha Kodikara,et al.  Factors contributing to large diameter water pipe failure as evident from failure inspection , 2013 .

[3]  Nguyen Lu Dang Khoa,et al.  Long-Term Water Pipe Condition Assessment: A Semiparametric Model Using Gaussian Process and Survival Analysis , 2020, PAKDD.

[4]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[5]  Symeon E. Christodoulou,et al.  Statistical Modeling of the Structural Degradation of an Urban Water Distribution System: Case Study of New York City , 2003 .

[6]  Isam Shahrour,et al.  Prediction of watermain failure frequencies using multiple and Poisson regression. , 2009 .

[7]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[8]  Fang Chen,et al.  A Multivariate Clustering Approach for Infrastructure Failure Predictions , 2017, 2017 IEEE International Congress on Big Data (BigData Congress).

[9]  R. Kay The Analysis of Survival Data , 2012 .

[10]  Nitin Saxena,et al.  Utilizing machine learning to prevent water main breaks by understanding pipeline failure drivers , 2020, ArXiv.

[11]  Yuan-Ting Zhang,et al.  Risk Prediction of One-Year Mortality in Patients with Cardiac Arrhythmias Using Random Survival Forest , 2015, Comput. Math. Methods Medicine.

[12]  R. Pick,et al.  Prediction of the failure pressure for complex corrosion defects , 2002 .

[13]  Yang Wang,et al.  Water Pipe Failure Prediction: A Machine Learning Approach Enhanced By Domain Knowledge , 2018, Human and Machine Learning.

[14]  Andrew Wey,et al.  Combining parametric, semi-parametric, and non-parametric survival models with stacked survival models. , 2013, Biostatistics.

[15]  Uri Shamir,et al.  An Analytic Approach to Scheduling Pipe Replacement , 1979 .

[16]  S Burn,et al.  Seasonal factors influencing the failure of buried water reticulation pipes. , 2011, Water science and technology : a journal of the International Association on Water Pollution Research.

[17]  Yang Wang,et al.  Water pipe condition assessment: a hierarchical beta process approach for sparse incident data , 2014, Machine Learning.

[18]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[19]  Jerzy Adamski,et al.  Random Survival Forest in practice: a method for modelling complex metabolomics data in time to event analysis. , 2016, International journal of epidemiology.

[20]  Liye Sun,et al.  Kernel-specific Gaussian process for predicting pipe wall thickness maps , 2015 .

[21]  Rayid Ghani,et al.  Using Machine Learning to Assess the Risk of and Prevent Water Main Breaks , 2018, KDD.

[22]  J. Friedman Stochastic gradient boosting , 2002 .

[23]  Jun Yan Survival Analysis: Techniques for Censored and Truncated Data , 2004 .

[24]  E. Kaplan,et al.  Nonparametric Estimation from Incomplete Observations , 1958 .

[25]  Niklaus E. Zimmermann,et al.  Predicting tree species presence and basal area in Utah: A comparison of stochastic gradient boosting, generalized additive models, and tree-based methods , 2006 .