LEVERAGING DATA ANALYTICS TO IDENTIFY SUBSURFACE LEAKS IN JACKSON, MISSISSIPPI’S RECOVERING WATER DISTRIBUTION SYSTEM

Loading...

Date

2025-04-25

Journal Title

Journal ISSN

Volume Title

Repository Usage Stats

12
views
3
downloads

Abstract

The City of Jackson, Mississippi has long faced critical water infrastructure challenges. Between 2017 and 2021, over 7,300 water main breaks were recorded. This staggering rate of failure was 3.7 times higher than the national industry average. Sustained infrastructure deterioration culminated in a full system collapse in August 2022, when severe storms disabled the city’s water treatment facilities, leaving approximately 150,000 residents without safe drinking water. In response, a federal emergency was declared, and JXN Water was created as a court-appointed third-party manager to lead recovery efforts. Since then, JXN Water has focused on immediate repairs, digital mapping, operational modeling, and implementation of an asset management system. These advancements have stabilized the system and opened pathways for innovative approaches to long-term problem solving. One of the most significant ongoing challenges in Jackson’s water system is the high volume of non-revenue water (NRW), water that is treated and distributed but never reaches customers due to system leaks. Reducing NRW is crucial for financial sustainability, environmental protection, and restoring public trust. While field-based leak detection technologies such as drone thermal imagery and acoustic sensors have proven valuable, this study explores a data-driven alternative: using machine learning (ML) to proactively predict where leaks are likely to occur. The research investigates whether a Random Forest Classifier (RFC) machine learning model can accurately predict leaking versus non-leaking pipes in Jackson’s distribution network. The RFC works by aggregating predictions from decision trees trained on different subsets of data to classify outcomes. This study leveraged over 41,000 water mains in a selected service area and used confirmed leak data from 813 recorded breaks between June 2024 and March 2025. To address the dataset’s imbalance (only about 2% of the points were leaks), oversampling and undersampling techniques were applied to improve model training. Seven variables describing each water main’s physical and operational characteristics were used to train the model. All variables were derived from geospatial, hydraulic, and infrastructure data curated and processed using Python and ArcGIS Pro. A five-fold cross-validation technique was employed to evaluate model performance. Model performance was assessed using three standard metrics: 1) Precision (14%): The proportion of predicted leaks that were actual leaks; 2) Recall (46%): The proportion of actual leaks that were correctly predicted; 3) F1 Score (21.4%): The harmonic mean of precision and recall. The confusion matrix revealed the model correctly identified 75 true leak cases (true positives) while producing 461 false positives and 88 false negatives. While this demonstrates that the model has predictive value, the relatively low precision underscores the need for further refinement before operational deployment. Despite limitations, the model’s tendency to produce more false positives than false negatives is acceptable in the context of Jackson’s system. Because leak detection is conducted across areas rather than individual pipes, clusters of false positives may still guide efficient deployment of field verification resources. One of the most valuable outputs of the RFC model was its feature importance ranking, which indicates how much each variable contributes to predicting leaks. The most significant finding was the importance of distance to the nearest valve, suggesting that pipes further from operational control points are more vulnerable to leaks. This insight aligns with ongoing valve assessments by JXN Water and reinforces the need to prioritize valve maintenance and mapping. Conversely, variables such as pipe material, age, and water demand, commonly assumed to be correlated with leakage, were found to have minimal predictive value in this study. This challenges existing assumptions and helps refocus asset management priorities on variables with stronger evidence of association with pipe failure. Several areas for refinement were identified that could substantially improve future model performance: time horizon adjustment, reformulation of aggregated variables, data structuring by zone, alternative aggregation techniques, and indexing realignment. This research marks an important first step in exploring predictive analytics for leak detection in Jackson. Although the initial model’s F1 score is modest, the potential for progress is high. The RFC model has proven useful for identifying variables that matter and those that do not. This enables more targeted data collection, system upgrades, and field inspection protocols. Importantly, this study does not propose machine learning as a standalone solution. Rather, it positions data-driven leak prediction as a complement to field-based approaches. Jackson’s water distribution system remains complex and partially undocumented, particularly regarding pipe materials and installation ages, which are key variables known to influence acoustic leak detection success. In light of the work by researchers who have emphasized tailoring loss reduction strategies to utility-specific conditions, this study demonstrates how localized, data-informed modeling can serve as a foundation for smarter infrastructure planning. The experience of Jackson also provides a compelling case for other U.S. cities with aging infrastructure to adopt similar predictive tools. The long-term vision is to reach an F1 score of 70–80, where model predictions would be highly reliable. Achieving this would enable JXN Water to operate a data-guided, field-verified, hybrid leak detection system that increases operational efficiency, conserves resources, and ultimately delivers more reliable service to the people of Jackson. By embracing a future that combines data science with traditional engineering solutions, JXN Water has the opportunity to build a smarter, more resilient water distribution network for the residents of Jackson.

Description

Provenance

Subjects

Leak Detection, Data Science, Water Utility

Citation

Citation

McNeill, Jennifer (2025). LEVERAGING DATA ANALYTICS TO IDENTIFY SUBSURFACE LEAKS IN JACKSON, MISSISSIPPI’S RECOVERING WATER DISTRIBUTION SYSTEM. Master's project, Duke University. Retrieved from https://hdl.handle.net/10161/32266.


Except where otherwise noted, student scholarship that was shared on DukeSpace after 2009 is made available to the public under a Creative Commons Attribution / Non-commercial / No derivatives (CC-BY-NC-ND) license. All rights in student work shared on DukeSpace before 2009 remain with the author and/or their designee, whose permission may be required for reuse.