Stormwater

Domain: Stormwater

In the stormwater domain, our focus problem revolves around dry-weather monitoring by working with partners in Orange County, CA. Here, the goal is to identify the source locations and timing of effluents and contaminants that are being put into the stormwater system during dry, i.e. non-storm periods. A range of research problems from IoT device placement to generalizable time series analytics are being explored. This can be difficult due to the large number of nodes, as well as a general lack of instrumentation throughout the network. To this end, we have looked at several associated subproblems, including sensor placement to find the locations to install sensors with a budget constraint and resource-efficient monitoring to try to reduce high deployment and operational costs. 

We also studied the role of network structure and its relation to the data generated using time-series analytics, which finally leads to the idea of generalizable data analytics, where we want to learn ML models that are robust to location-specific biases. One way to guide the decisions to be made is by relying on real-world, specialized water networks, like EPA SWMM. 

Main Research Challenges

  • IoT Placement to support reliable and timely monitoring: We looked at the general problem of IoT instrumentation in the stormwater use case. The focus problem here consists of how to place sensors inside a water network to support reliable and timely monitoring of anomalous phenomena. This naturally decomposes into two problems: first, IoT sensors must be deployed in the water network so that transient anomalies can be reliably detected; this leads into the second problem of using sensor observations from instrumented IoT devices to derive the set of potential sources. However, the optimal solutions for each problem, when considered independently, detract from the optimal solutions for the other problem. This comes as a result of the objectives typically considered in both the IoT placement problem and the source identification problem. Namely, the IoT placement problem typically considers maximizing the coverage of locations in the network using the fewest number of sensors, while the source identification problem typically relies upon using sensor observations to eliminate infeasible sources of anomalies. The issue arises when maximizing coverage limits the number of locations to eliminate, thus leading to the challenge of jointly considering both IoT placement and source identification capabilities towards a solution.  To address such issues, we integrate historical data and semantic information about land use with geographical data concerning the water network. We consider factors such as the coverage of locations by deployed sensors and the ability of a sensor observation (or lack thereof) to constrain the set of potential source locations where anomalies could be introduced, as well as the priorities given to different locations, which we derive using domain knowledge and historical data. 
  • Supporting Adaptive Edge Analytics through Reinforcement Learning: The Orange County Public Works agency monitors the storm drains around OC by instrumenting many of them with sensing units. These units consist of multiple sensors such as turbidity, dissolved oxygen, conductivity, temperature, among others, and are battery operated. Since the units are installed underwater, they are kept in ruggedized housing, and this makes the task of battery replacement complicated. Constant replacement would require significant human time and effort, especially considering the large number of such units in different storm drains. We built a framework for dry and wet weather monitoring of contaminants, that leveraged these sensing units, which could ensure high levels of monitoring performance, but at the same time conserve battery life by utilizing a subset of sensors rather than all of them (i.e.) coarse-grained vs. fine-grained sensing.  However, since each storm drain and location could have different event patterns, a one-size-fits-all approach to switching between coarse and fine grained sensing would not result in good performance across all locations. Hence, we leveraged a data-driven approach, by training reinforcement learning agents for each sensing location, that used historical data from the corresponding sensing unit to learn about the unique event characteristics of wet and dry weather contamination. Each agent utilized this knowledge to adaptively switch between coarse and fine-grained monitoring whenever appropriate, with a goal of ensuring that no contamination events were missed, while at the same time maximizing battery life. Our results demonstrate that such a data-driven learning approach resulted in more than 90% accuracy in detecting events, while improving the battery life of the units by more than a month. This, hence results in a significant reduction in the time and effort required for their maintenance. 
  • Generalization of Analytics across communities:  The effectiveness of contamination monitoring in storm drains is dependent on the quality of the analytical model that is trained. Machine learning is a powerful tool for this purpose, but requires a lot of historical data to train effective models. Furthermore, collecting this data requires heavy instrumentation of the storm drains in a community, which in turn requires significant investments of capital and time for sensor deployment. However, not all communities have access to the capital and time needed to collect sufficient data to train effective monitoring models. Moreover, sharing data between communities to solve this is challenging, since most agencies have stringent privacy and security policies that prevent them from sharing sensitive and confidential data. In addition, the data collected by each community might contain biases that can influence the model during training, resulting in its poor performance when deployed in a new community where these biases are not present.  To address these challenges, we developed a distributed approach to training analytical models that can be generalized or shared across communities without a loss in performance. This would therefore result in significant savings in costs and time, since other communities can reuse models without the need to collect large amounts of data. Our approach uses federated learning, which is a technique for distributed training that can train an analytical model using data from multiple sources or communities without the need to share the raw data and hence preserve existing privacy policies. To handle data biases, we leverage the stability of different data features during training, to identify measurements that had a causal relationship with contamination, and those that were spuriously correlated and hence biased. Our approach ensures that models would ignore biases and hence can be reused across communities without any loss in performance. Our experimental results show that our approach outperforms existing analytical training approaches by over 20% in terms of accuracy, when training and reusing a model for stormwater contamination detection across different communities in the United States. 
  • Accurate Detection of Anomalies and Time-Varying Phenomena.    Our focus problem revolves around dry-weather monitoring working   with partners in Orange County, CA. Here, the goal is to identify the source locations and timing of effluents and contaminants into the stormwater system during dry, i.e. non-storm periods (and potentially illicit) flows.  From the stormwater perspective, our goals this past year were to combine IoT sensing with data-driven learning approaches to develop (1) solutions for high quality monitoring that ensure resource-efficiency by leveraging stormwater event characteristics and (2) methodology to translate monitoring solutions from one stormwater location to another without the need for additional training data or instrumentation. 

Ongoing Results

  • Built a framework that we titled REAM, for dry and wet weather monitoring of contaminants in storm drains that leverages real-time sensing data from units instrumented across Orange County. The framework  consists of reinforcement-learning based agents that train on historical data to learn contamination event characteristics that can be leveraged to prolong the battery life of the sensing units. The agents switch between coarse and fine-grained monitoring by activating different subsets of sensors based on the observed storm drain conditions and historical knowledge. Our work demonstrates the benefits of such a data-driven learning approach to achieving good monitoring performance while judiciously utilizing limited resources. 
  • Explored sensor placement policies to help support dry weather monitoring efforts in a stormwater network spanning Orange County. The initial sensor placement policies developed used only simple geographical properties and graph features of the underlying network (e.g., elevation, node degree); we learned to operate the stormwater simulator SWMM and set up a pipeline to test more complex heuristics. From this baseline, we plan to explore more complex placement heuristics that are data-driven from historical contexts and consider semantic tags of neighboring regions. 
  • Began to address the challenge of sharing solutions between different community agencies and stakeholders by developing a methodology for training and sharing data-driven monitoring analytical models. Our solution enables training these analytical models using data from multiple sources (agencies) in a distributed manner without the need for sharing the raw data which is always a big privacy concern for these agencies. In addition, our training approach ensures that the final monitoring model is not influenced by any data biases that are present in each agency’s data, thereby ensuring that it can be shared with other communities without a loss in performance. 
Sensing units in stormwater flows
SWMM (Stormwater) Model of Newport Beach, CA
EPA SWMM (Newport, CA) simulates pollutants being propagated through the network