Safety in Industrial AI Design

Author

Petuum Team

In an industry where unanticipated mechanical behavior or equipment failure can be fatal, safety is an ever-present concern and priority. With industrial AI solutions now operating in factories all over the world, we have to ensure that every process is performing exactly as it was designed.

Safety should never be seen as an add-on or an afterthought. Safety should be an integral part of the process from the beginning in design to the operation. It’s important to understand that safety in industrial AI is applied technology as well as a philosophy and safety-centered mindset. In this article, we’ll discuss general principles of safety in industrial AI design and provide a list of guidelines to ensure operational safety.

Safety in AI Design

First, safety needs to be part of the design, part of the algorithm. There are specific touchpoints that need to be considered when designing these algorithms to improve safety standards.

Objective function design

Good objective function design is paramount in creating safe AI solutions, but it is challenging to achieve due to several environmental complexities. Balancing the right objective function ensures that the AI systems are secure and reliable, so when the AI solution “performs as designed,” that includes safe operation. In other words, our design goal is to ensure that our AI system does what it’s supposed to do with no deviations.

A need for constraints

AI models can find zones of operation that a human operator would never consider. Consequently, we run into an AI paradox. How can we let AI models find unexplored solutions when those are the riskiest? This is going to be an ongoing journey that has very little to do with the actual technology but instead has everything to do with how we build precautions. Initially, we need to build these constraints into the model so that safety is inherently baked in before we begin to relax them with the utmost caution. Such diligence will help in identifying issues because of data quality/completeness and shortcomings of our objective functions.

Safety in AI Operation

Functional safety means that AI systems perform as designed and as expected. There are operational procedures and checks we can build into our process. The following operating guidelines should be included in every AI product or solution:

Continuous system monitoring

For any AI solutions in production, continuous monitoring of the system is an absolute must, and how this is done should be in place before deployment. With real-time monitoring, any output that is outside normal operating parameters or if an unknown pattern emerges, steps to correct the situation can be made immediately. System monitoring cannot be oversold. If you do not have some level of monitoring ready to go, do not deploy your AI solution.

Safety override, both automatic and manual

There will always be some amount of risk in the operation of any AI system. Safety overrides must be available, both those that are auto-triggered and those that are initiated by a human operator. Not only is this a critical safety feature, but it also provides a measure of confidence that humans are still in control.

System failover mode

A failover mode, or safe state, ensures that the system returns to a known stable state in the event of any violation of safety boundaries. This failover state needs to be built into the system from the beginning for it to work effectively.

Thorough field testing and validation

Right now, it is not possible to have unbiased data that perfectly represents the real world or to create a perfect simulation. Because of this, there has to be thorough testing of the system before it is deployed.

In the absence of complete representational data or a simulator, we must test our AI systems in an actual production environment — after we have made sure we have enough safety checks built-in.

Graceful state change

A graceful state change is needed for those instances when an AI system suddenly fails in the middle of its operation. There’s no guarantee that an operator would be able to bring the system back up in a safe state. Graceful degradation is a must for any AI system.

Stick to high confidence actions

To ensure safety, you should restrict AI operations to those that are above a “high confidence” threshold. Other operating conditions with lower confidence scores can be used to improve the system and refine the AI models.

Secure backup of rules and heuristics

Whenever something unexpected happens, or when the system goes down, the consequences can be severe. If we have to disengage AI control and allow the operator to take over, a failsafe collection of rules and heuristics will help us gracefully leave the system in a safe state.

AI Safety as a Principle

Industrial AI is becoming big business. When a mistake has the potential to kill you, the stakes are high, and security has to be ready. But if you pay attention to the guidelines above, your system should perform safely and as designed.

Originally published at petuum.com on September 12, 2019.

‍

Learn More