Wastewater

Domain: Wastewater

The driving usecase for this work on process mining is based on conversations with wastewater treatment plant operators.  The focus problem here is to capture within SWADE, implicit knowledge about processes  that are today held as institutional knowledge by experienced operators.   This is a major barrier for training and process improvement. The purpose of wastewater treatment process modeling is to discover implicit knowledge within the treatment facilities and give insights into improving the process, errors, etc. An undergraduate student in the IoT-SITY project worked on integrating IoT within wastewater treatment facilities in order to improve the operation and data collection within their respective treatment processes. By utilizing process mining on synthetic data, we are able to see how process models can be produced and used within the SWADE framework. A three step methodology was established in order to accomplish this goal. First, a general wastewater treatment process model for our theoretical tests. This model was then used to generate artificial event logs using an application called Processes and Logs Generator 2.0 (PLG 2.0). The reason for this is because there are no logs publicly available by wastewater treatment facilities, which is one of the problems SWADE seeks to solve. After generating the synthetic logs, a heuristic miner process mining algorithm was used on the logs to generate process models. This was accomplished by using ProM (a process mining toolkit application). With the resulting models, a comparison was made between them and the manually produced model. This comparison showed that readable and accurate process models are able to be produced and indicated that sensors needed to be placed at each activity in the facility in order to record water flow between them. We were able to conclude that IoT sensors can be integrated within wastewater treatment systems in order to collect data, form logs, and produce meaningful process models that can be used to uncover insights within facilities. 

From process mining perspective, our goals are: (1) be able to discover workflow (process) models from historical data; (2) monitor conformance of actual activities with the workflow models; (3) suggest workflow model improvements on safety, efficiency, etc., (4) early prediction of process success/failure, and (5) ensure private information is not released while we use the data for task (1) to (4).  For the past year, the major effort was on goals (1), (2) and conducting preliminary study on (4) and (5). 

In the wastewater domain, we work with wastewater utilities in Greater Chicago.  

Main Research Challenges

  • Extract the domain expert knowledge and experience from the historical data in the form of event logs. To extract the knowledge in an efficient manner, we apply process mining techniques on the historical data in the form of event logs.  Process mining techniques have emerged as a new research field that uses available data to understand a process and improve it when possible. Given event logs, the goal of process mining is to extract process knowledge (e.g., process models) in order to discover, monitor, and improve the real processes. This year’s focus is on mining timing constraints and discovering process scenarios.
  • Develop mechanisms and tools to clean out the historical data when the data contain outliers: Event logs contain abundant explicit information related to events, such as the timestamp and the actions that trigger the event. Process mining techniques rely on the assumption that these event logs contain accurate representations of an ideal set of processes. These ideal sets of processes imply that the information contained within the log represents what is really happening in a given environment. However, many real-life event logs contain noisy, infrequent, missing, or false process information that are generally classified as outliers.  In order to improve accuracy of knowledge discovery,  we also worked on developing algorithms to filter out outliers from event logs with Hidden Markov Models.

Ongoing Results

  • Mining Timing Constraints from Event Logs for Process Model: Process mining is a technique for extracting process models from event logs. Event logs contain abundant information related to an event such as the timestamp of the event, the actions that triggers the event, etc. Much of existing process mining research has been focused on discoveries of process models behind event logs. How to uncover the timing constraints from event logs that are associated with the discovered process models is not well-studied. In this work, we present an approach that extends existing process mining techniques to not only mine but also integrate timing constraints with process models discovered and constructed by process mining algorithms. A real-life road traffic fine management process scenario is used as a case study to show how timing constraints in the fine management process can be discovered from event logs with our approach.  It would be better if we could apply our developed methods on real-life wastewater process event logs to see the efficacy of our approaches.  Unfortunately, we are still working on obtaining these real-time event logs.  
  • Using Event Log Timing Information to Assist Process Scenario Discoveries:  The most important learning task in the broad field of process mining is called process discovery, which is concerned with the derivation of process models from event logs. Over time, a range of process discovery algorithms have been proposed. Despite the demonstrated usefulness of process discovery algorithms, these algorithms face challenges in an environment where different scenarios exist. When different scenarios are grouped into one process model, not only the accuracy of the model representing the reality reduces, more importantly, the complexity of the model becomes incomprehensible and hence makes it difficult, if not impossible, to achieve the goal of better understanding, monitoring and improving the current processes. However, much of existing research has been focused on applying activity names to assist process scenarios discovery. In addition, many existing algorithms commonly used in the literature require prior knowledge about the number of process scenarios existing in the log, which sometimes are often not known apriori. We have developed a two-phase approach that obtains timing information from event logs and uses the information to assist process scenario discoveries without requiring any prior knowledge about process scenarios.
  • Develop mechanisms and tools to clean out the historical data when the data contain outliers: Many process mining algorithms are based on the assumption that an event log accurately represents information about a working process as it takes place. Unfortunately, many real-life process event logs often contain noisy, infrequent, missing, or false process information that are generally classified as outliers. The presence of outliers will lead to infrequent paths within the derived model. This causes the process model to become cluttered and results in a model that is simply not an accurate representation of the actual behavior. In order to limit these adverse effects, event logs are typically subjected to a pre-processing phase where they are manually cleaned from outliers. However, this is a challenging and time-consuming task, with no guarantee on the effectiveness of the resulting model, especially in the context of large event logs exhibiting complex process behavior. The inability to effectively detect and filter out outliers adversely affects the quality of the discovered model.  To overcome the challenge, we have developed an approach that uses Hidden Markov Models to filter out outliers from event logs prior to applying any process discovery algorithms, so that we can improve accuracy of knowledge discovery.