Unlocking an affordable zero-carbon source of water with data analytics

Most Indian cities are both blessed and challenged with this affordable zero-carbon water source. This water source is the almost 50% of water that is produced by Indian cities but is lost before it reaches the consumer.

About 30% of urban households in gated societies have the luxury of 24/7 water supply and RO filters safeguarding their health but the rest of urban India relies on the city to supply them clean and affordable water. Even gated societies themselves are reliant on municipal water as a primary source. Municipal water access is not just an urban livability issue but urgently needs to be seen in a climate and emissions context.

Let’s consider Bengaluru as an example. Bengaluru’s water supply Board pumps in water from reservoirs near the Cauvery river, from about 80KM away. Once the water gets to the city, it goes through chemical treatment plants before being pumped into a massive water distribution network of 9000 KMs to reach 1M households and businesses. Bengaluru’s daily Cauvery water production of 1400 Million Liters would consume approximately 140 MWh, equivalent to 105 Metric Tonnes of CO2 emissions.

In Bengaluru, out of the 1400 ML pumped into the distribution network only 50% of it reaches the customer. Recovering most of the 50% losses would nearly double the water availability at the customer end, with no additional energy used and no additional greenhouse gasses emitted.

So, why do Indian cities have such huge water losses? And how can just data analytics help recover these losses?

Central schemes such as AMRUT, Smart Cities Mission and Jal Jeevan Mission are propelling the expansion and digitization of Indian water utilities. What is not being addressed is the management challenge - the complex scale of city networks, lack of dedicated operational expertise, networks designed for 24/7 supply but being operated in an intermittent supply mode, etc.

Hardware for monitoring, automating and controlling the water networks are getting commoditized and affordable. But for the utility manager, the hardware alerts only add to the deluge of customer complaints. Detecting and diagnosing pipe bursts or leaks, prioritizing the lossy pipes within the thousands of kilometers of network becomes overwhelming. ​​

Cities are increasingly getting metered at the household level. And within the lakhs of households, there will be ones who have tampered or disconnected their meter, or households where the meter is either not read or has stopped working. Finding out the unmetered or illegal usage of water becomes as difficult as finding a needle in the haystack.

Fixing an individual pipe or replacing a meter is a simple task. But when network leaks and tampered connections are allowed to fester and build up over the years, losses start becoming overwhelmingly huge. The water networks are underground in the literal sense and also in the mindspace of a municipal manager: out of sight is indeed out of mind.

City utilities are extremely underserved in terms of technology solutions to aid operational management. The missing piece, as illustrated above, is an analytics solution which works across data silos and unifies utility expertise, sensor analytics, hydraulics and geospatial expertise, etc to pinpoint losses. Ideally, the solution should also provide a post-analytics task workflow to enable managerial follow up on in-the-field investigations and fixes.

How would this data-driven intelligence look like in the utility’s day to day operations?

Existing utilities networks, undergoing expansion or rehabilitation, need a bunch of pipes to be replaced within an available budget. City utility managers could prioritize only parts of the network that have a high likelihood of failure by analyzing the past maintenance history and customer complaints along with the network’s environmental stresses.
The utility manager can also proactively identify failing pipes before leaks or bursts actually develop.

In cities where customers have been metered and supply monitoring has been installed, city utility managers can be aided by a real-time bird’s-eye view and diagnosis of the lossy pipes. Operational staff can focus on tens of kilometers of pipe instead of thousands of kilometers.

On the demand side, utility managers need to ensure customer consumption is being properly measured, ensure consumption is being equitably satisfied across the city and monitor for any illegal usage or theft. They would need analysis at the meter or household level while also analyzing across socio-economic and spatial cohorts.

Why do we need this NOW?

With the ongoing water network expansion and privatization going on in India, every city utility, whether one of the 6 mega-cities or one of the 200 Tier-1/Tier-2 cities, will require such decision-support solutions to ensure long-term efficiency and sustainability of the new infrastructure.

Water production and distribution is an energy-intensive large-scale industrial process and for many Indian cities, water and wastewater handling is the largest municipal consumer of electricity, typically accounting for a third of total electricity consumed.

Orgs such as the World Bank and NITI Aayog estimate that by 2030, India will face a 50% shortfall in water supply equating to hundreds of millions of people facing severe water scarcity. And result in about a 6% loss in the country’s GDP by 2050.

Fixing water losses, at both the network and customer ends, positively impacts water availability for consumers, improves the city’s productivity, reduces energy spend, reduces the emissions footprint and eases upstream pressure on water sources and watershed ecosystems.

Our city. Our commons. Our responsibility.

We have leap-frogged our way to pervasive digital payments and 10-min groceries. It is time for a homegrown Utility Stack, a la India Stack, to revolutionize the way Indian cities run their utilities. And it is not just an India problem, the World Bank estimates that as a global average, 30% of the world’s piped water is lost before it reaches the customer.

At SmartTerra, we have embarked on such a journey, putting AI/ML to work for Indian water utilities, undeterred by the chaos and data-unreliability of a typical Indian city utility, and have demonstrated quantified benefits while working with infrastructure giants such as SUEZ and L&T in 6 cities in India.

Drop us a message at [email protected] or if you are in Bengaluru, drop by at HSR Layout! Would love to connect.


  1. NITI Aayog. 2019. Composite Water Management Index.
  2. How is India addressing its water needs? World Bank. Accessed 24 August 2022
  3. Embodied energy comparison of surface water and groundwater supply options. Weiwei Moa, Qiong Zhanga, James R.Mihelcica, David R.Hokansonb. University of Florida, November 2011
  4. The Carbon Footprint of Tap Water Is a Lot Higher Than You Think, Lloyd Alter, Treehugger, June 2, 2021
  5. US Environmental Protection Agency, Greenhouse Gases Equivalencies Calculator. Greenhouse Gases Equivalencies Calculator - Calculations and References | US EPA

There can be alternative financing models for great climate outcomes for Indian cities, helping them escape the L1 tender nightmare.

Here is an example:

Microsoft has pledged to be water positive by 2030 by committing to: reduce water use, replenish water sources, provide people with access to water and sanitation services, advocate for effective water policy and drive innovation and data digitization.
Thames Water, London’s water utility, has committed to reduce the water lost to leakages in its distribution network, currently at 24%, down to 20% by 2025 and 10% by 2050.
FIDO is a startup which has developed a sensor/device which once placed on a water distribution pipe, identifies leaks by analyzing the acoustic signature of water gushing out.

The win-win-win collaboration: Microsoft is financing the deployment of FIDO devices across an additional 350km of Thames Water’s pipe network. For the next 10 years, Microsoft will link the identification and repair of leaks with a specific quantified volume of water saved. The final volumetric savings will then be factored into Microsoft’s overall water use calculations for inclusion in its annual environmental, social and governance (ESG) reporting.

India has quite a few corporate giants. At SmartTerra, we have demonstrated water loss reduction at city scale. Our cities could definitely use some help.

Can the folks here help replicate such a win-win-win collaboration in India?

Reference: GWI April 2023


Thanks Gokul. Setting up a call with you soon.

@Nikita_Harikishan thoughts?

Hi Gokul, thanks for sharing man- felt like learnt something! Here is just one addition.

  • This is not just the urban problem in fact, 80% of villages in my district - Samastipur - have piped-based water systems (This has happened in the last 2 years).

This means, this is a real problem and the opportunity size is absolutely large! :slight_smile:

Would love to keep learning more about the solution or you can talk about the AI/ML model in brief, if possible, that would be great - I will learn more! :slight_smile:

1 Like

Hi @Suman_Jile - Navaneethan from SmartTerra here. Happy to answer your question. There are broadly two separate categories when it comes to AI/ML that we focus on: data unification and algorithms/models

Data Unification

Data from utilities comes in different forms: geospatial pipe network data, sensor time-series, customer information & billing, maintenance history, complaints, etc. These data are “silo-ed”, meaning that they don’t reference or “connect with” each other. This is unfortunate, because the insights needed to solve water challenges are at the intersection of these different datasets.

Our products take raw data from utilities and use different algorithms for data verification, data cleaning, and unification. The output of this processing stage is a unified data model that enables utilities to visualise and understand the interconnected nature of their operations, network, and customers.

As an example, consider geospatial data: we have leaks, maintenance records, customer complaints, pipe network, and customer meter locations. These datasets need to be linked to each other to provide a richer understanding of where different network assets are, their history, and how they interact with each other.

AI/ML Algorithms

Once we produce a unified data model, we use AI algorithms for:

  • Predicting which pipes in the network are most likely to fail

    • We train our model to learn from past leaks, maintenance activities, environmental stresses (soil conditions, waterlogging, traffic, etc.), and operational issues (variations in pressure and flow)
    • The model identifies patterns across these different data sources that lead to pipe failure
    • Predictions from the model are tested by utility staff on the ground and verified
    • Feedback from their on-ground investigation is used as additional input to improve the model
  • Automatically detecting signatures of sensor failure and anomalous network behaviour

    • Here, we look at sensor data time-series to identify periods in which the sensor’s signal exhibits anomalous patterns using time-series processing techniques to smooth the data and extract significant features from it
    • We also use imputation techniques (such as Kalman Filters) to “fill-in” missing data in the time-series
    • We then compare the sensor’s current properties to how it historically behaved and against patterns from similar sensors
    • This tells us whether the behaviour of the sensor is anomalous
    • Deviations are flagged and presented for on-ground verification, with field technicians investigating the possibility of leaks as the cause of this anomalous behaviour
  • Pinpointing customer meters that are malfunctioning and need immediate replacement

    • Customers are placed in different cohorts and the algorithm flags a malfunctioning customer meter as if it deviates negatively from its cohort over time (meaning that its pattern starts to differ from its fellow cohort members)
    • Feedback from on-ground staff comes from in-situ or lab meter testing, and these values are used to recalibrate and improve model performance
  • Hydraulic calibration algorithms for detecting abnormal pressure and flow data

    • For finding pipes with losses, we also use hydraulic modeling and optimization approach.
    • A genetic algorithm based approach is used to look for the optimal set of hydraulic parameters which can model the water-network’s pressure and flow behavior while also modeling leaks or illegal usage.
    • The optimal set of hydraulic parameters is assumed to be found when the delta, between measured values from sensors and the model’s prediction, is minimal.

Thanks, Naveen - Super helpful. I have almost a complete picture except for a few ones - what is the AI/ML model you are using to train and predict your unified data points and anomalies? (Don’t tell me this is a proprietary model. I think @knadh Sir will also agree, when all the high-quality models are available at ‘pip install’ there is absolutely no point in building a model from scratch.

And that question also originates a few additional questions

  1. What is the model accuracy?
  2. What is the data set look like?

The only reason I am asking you all these questions is because even if you are getting the call with the Rainmatter team they would like to know all these. :slight_smile:

1 Like

@Suman_Jile sure, happy to provide more detail.

In our context, we use different algorithms, some open-sourced and others developed in-house (when we aren’t able to find a suitable approach in the research literature).

  1. Ensemble tree-based classification algorithms like Random Forests, Gradient-boosted Trees, etc.
  2. Clustering algorithms such as HDBSCAN, OPTICS, agglomerative clustering, etc.
  3. Hydraulic modelling approaches based on EPANET
  4. Time-series smoothing and extraction techniques: LOESS and MSTL
  5. Cohort-based time-series trend detection

Regarding model accuracy, let’s consider pipe “likelihood of failure” prediction. Our model predicts how likely a pipe is to leak in the next 12 months. We compare this against the actual 12-month data that the model has not been trained on. The accuracy measure here is a utility curve: how many leaks can I find by examining the smallest extent of network. A water utility would like to maximise the number of leaks identified while having to explore the smallest fraction of the network. In this measure, for a mere 10% network length, we identify 26% of leaks. This means that over 1/4th of the leaks are concentrated in 1/10th of the network.

When we add digital twin modelling (sensor data + hydraulic calibration) into the mix, we achieve 77% accuracy in identifying and localising leaks.

Regarding data sets, we use the following:

  1. Geopatial data - geoJSONs/shapefiles of the network, leaks & maintenance, environment, and operational boundaries
  2. CSVs/Excel files of customer metering, billing information, sensor data, and complaints
  3. INP files containing hydraulic data describing network operations such as reservoirs, tanks, flows, and pressures

Finally, I’d like to distinguish between an algorithm and a model. When you say that most are available from pip install, that refers to algorithms: these are indeed open-source and easily available to anyone. A model, on the other hand, is a combination of an algorithm, data/engineered features, and hyperparameters - it’s an artefact that is the output of the modelling process. This part is, of course, proprietary.

Thanks for sharing @Navaneethan. I would love to know the reasons behind the lower accuracy rate of 77%, or is it because we have lower data sets? And the accuracy will improve as we will have larger datasets?

Also, on the difference between Models and algorithms - I know the difference. :slight_smile: But here is one part from @knadh Sir - I follow the best!

Thank you for your interest @Suman_Jile. Great questions and thoughts!

Analyzing the water network with just software, without any proprietary hardware required, with an accuracy greater than 50% is super beneficial to the utilities.

I think we can better contextualize the accuracies and models in a call. Will set up a call with you sometime soon.

Understood @gokul - my contributions are limited just to the Grove! I am hoping you will have a call with the Rainmatter team soon. :slight_smile:

@gokul @Navaneethan

Have set up a call. Looking forward.

1 Like