Interview with Martha White
WT: Martha, in 2020 you were named one of the Top 10 Researchers in the World to Watch by IEEE (Institute of Electrical and Electronic Engineers)
Congratulations on that amazing distinction. Please introduce yourself to our viewers.
White: There are many amazing early career researchers, and I am proud of being included in this highly competitive group.
I am an Associate Professor in Computer Science at the University of Alberta and now also the CEO of RL Core Technologies. My main area of research is reinforcement learning which is a sub-area of machine learning focused on automatic decision-making.
WT: Please explain ‘reinforcement learning’ and ‘automatic decision-making'.
White: Thís contrasts with most machine learning systems, which extract patterns from data to make predictions. An example would be speech recognition of ChatGPT which is trained on the internet. This learning is passive. The machine learning system learns on the data it is given.
The key distinction in reinforcement learning is that the control system (the agent) actively decides what data to get to improve its decision-making policy.
Consider a simple example for controlling a thermostat in your home to reduce energy costs. A typical machine learning prediction system might use data collected from a year in your home to predict how much energy you will use in the next few days.
A reinforcement learning agent goes beyond prediction and tries to learn to reduce energy costs.
It might try different thermostat temperatures to better learn what settings reduce costs rather than simply using past data.
WT: You have said that that ‘Reinforcement Learning is the next big thing in AI’
How did your area of research in reinforcement learning lead to founding RL Core?
White: In my research career, my focus has been on algorithm development. This kind of algorithm work is important, because the ideas created in the lab can assist a wide variety of applications. However, there is a disconnect from real problems resulting in a lack of impact on important applications.
When I was approached to work on the use of reinforcement learning for drinking water treatment, I immediately recognized the potential of reinforcement learning technology since water treatment touches so many people and industries.
This project began with a local engineering firm, ISL Engineering, whose vision was to determine how AI could benefit water treatment.
In the past year, it became clear that it was time to take this technology out of the lab and into the real world. The natural way to do this was to create a start-up focused on the software (the brain), to collaborate with ISL doing what they do best – integration (the body).
WT: Most recently, you received another especially important distinction. The University of Alberta has launched a new fund to invest in start-up ventures.
Congratulations on being the first recipient of this fund. Please tell us about the fund and the impact of this recognition for your start-up.
White: We are thankful that the University of Alberta is so forward-looking and willing to pursue such an ambitious idea. The University of Alberta is a huge economic engine, and The Innovation Fund supports a growing ecosystem of innovative companies. This fund is another step towards bolstering the economy by recognizing world-class research and exceptional talent amongst students and faculty within the University to help them found companies.
We are thrilled to be recognized. This is a signal to the world of their belief in the strength of our team and our technology because it shows a commitment to solving critical sustainability issues.
The Innovation Fund and the CEO, Sheetal, are fantastic partners, for helping us find connections and navigate our ambitious goals to improve efficiency in industrial control using reinforcement learning.
WT: Please explain how RL Core technology works.
White: We have named our start-up to include ‘RL’ because we believe ‘Reinforcement Learning’ (RL) will become a well-known term, like AI.
RL leverages what machine learning does best – finding patterns and trends in massive data streams.
In our drinking water systems, for example, sensors are read once a second, with more than 100 sensor readings and millions of samples per year. This is too much information for a person to sift through but useful for a machine learning system. More information about what is happening in the system lets the agent make better predictions. An RL agent uses predictions to make decisions automatically.
The most deployed flavor of AI is typically used to imitate human performance. It is trying to become as good as the experts. Those systems are trained on many examples created by humans who know the correct answers. For example, a medical AI Diagnostic Tool might classify a scan as cancerous or not, based on labeled images from doctors. Or another example might be a Chatbot trained using the many logs of conversations between people on the internet
Imitating people works well only when we have the data and experts that make very good decisions.
However, for many problems even the most skilled humans do not know the right answers. We may not even know good answers.
Industrial control systems are full of these types of problems. This is why our focus is different.
It is difficult to constantly tune and calibrate many interdependent systems, such as drinking water treatment. Even expert operators cannot perfectly adjust the system to get optimal performance—and in many cases it takes years to become an expert operator.
People are good at higher-level control and decision-making, what we often call common sense decision making. They are not as well equipped to monitor a space with lots of sensory information, continuously, across a large process with many interacting components. This is where our approach comes in.
We can help operators and engineers solve these problems well because of two key strengths. One is that our algorithms are designed to reason counterfactually. That is to say, the agent can imagine, in its head, diverse ways of controlling the plant.
This requires some prediction of what these alternative outcomes could have been, which is where machine learning comes in. Logs of data from previous control strategies and from operators, can be used to try to get something as good as or better than the current approach.
The other key strength is that the controllers can constantly adapt in real time. This means that even if the data is limited and even though the system changes over time, the system can constantly adapt its solution to continue to get superior performance.
WT: How is RL Core able to improve current drinking water treatment plants – many of which already have some level of automation?
White: The fact that many drinking water systems already have some level of automation is an advantage. It is simpler to adjust existing systems by incorporating our improved controllers than it is to build from scratch. We are primarily improving the existing automation in the system, or automating parts that are currently manually tuned. RL Core is about making operators more effective and making operations more sustainable.
WT: Can you give us specifics on how this is achieved?
White: We are currently working with an ultrafiltration system. As an example, consider three control parameters in an ultrafiltration system-- chemical dosing rate for coagulation, aeration, and backwash duration.
Aeration is typically set statically, when commissioning the system.
Chemical dosing rates are adjusted daily by operators, using jar tests. This involves taking water samples and visually inspecting the impacts of different chemical concentrations. This procedure is time consuming and requires experienced operators to make accurate visual assessments.
Aeration is typically performed right before and after backwashing, to help remove the gunk from the filter. Aeration is typically set at commissioning, and never changed, but is known to be an important choice for having an effective backwash. It can be performed at many different times, and so the space of options is too big for most people to reason about or test.
Finally, backwash duration might be changed after commissioning, but only infrequently, potentially from remote monitoring by a commissioning company.
These are examples of parameters that need to be manually adjusted (chemical dosing), are left static (aeration) or are very infrequently changed again using manual adjustment (backwash duration).
RL can be used to adjust these continually (every minute), to reduce chemical and electricity usage and improve membrane health.
WT: Moving forward...what is next for RL Core?
White: RL is in the early stage. The research in the university helped us gain insights into the problem but was not focused on commercialization. We know we still have a lot to learn from water experts and will be spending time listening.
Our mission is to bring this technology to the water treatment industry. We know these RL algorithms work and we can see similar approaches doing amazing things in other domains.
Systems that try to emulate human performance have become a standard component of many of our systems. Reinforcement learning has not yet.
We believe this technology can help in many process control problems, given a dedicated team with expertise in RL.
As with any area, RL will be well-suited to some process control problems and not to others. A key step in bringing this technology into industrial control will be to better understand where we should use RL, not just how. Our current focus is drinking water treatment, but we will be actively looking for other problems to solve.