Redundancy, ROI And Risk In Industrial Control Systems

Engineers are used to making financial decisions based on factual inputs and realistic assumptions. Determining whether to invest in a project using the return on investment (ROI) accounting method has long been a safe & consistent decision making tool. However, when it comes to evaluating redundancy in industrial control systems, this method doesn’t hold up.

Greg Lynch

Apr 11, 2013

ROI is defined as: Net benefit of a project (over a set period of time) / amount invested.

A control system without redundancy will keep running smoothly under normal operating conditions. Since the extra investment in duplication does not increase the output, the ROI of redundancy is a negative value (extra output ($0) – cost($X) ) / cost ($X) = -1.

The true value of redundancy in a control system is shown when a critical failure occurs and there is no loss in production, damaged to equipment or injury/death of humans.

The right mindset for assessing the amount of necessary redundancy is similar to insurance analysis — what are the chances of a failure occurring and what are the consequences?

For example: If an investment of $5,000 in a backup PLC eliminates a risk of $50,000 in downtime, then the risk/reward calculation is easy. On the other hand, if an investment of $50,000 only eliminates a potential $25,000 consequence, then the outlay is less justified (unless the frequency of failure is high).

Determining the amount of necessary redundancy requires an engineer to perform risk management analysis. Risk formulas often use vague and ambiguous assumptions, some which may or may not pan out in the long run.

How Much Redundancy Do You Need in your Manufacturing System?

#1 — Where are the critical points?

To justify an investment in redundancy, the critical points in the production system must be determined. Look at both the upstream and downstream stages and determine if redundant equipment may help avoid cascading failures and bottlenecks in the manufacturing process.

#2 — What are the consequences?

There are many types of consequences to consider:

Financial — This includes everything from a loss of raw material to a loss of potential sales since delivery can’t be guaranteed.
Economies of scale — Downtime in a large production facility can have big impacts on the per unit cost when manufacturing efficiencies are not maintained.
Damage to equipment — If production is adversely affected, the replacement cost for a piece of equipment is usually significantly higher than just its purchase price.
Injury to workers — Workplace moral is hard to maintain when injuries or death occur to friends and coworkers.
Bad publicity — It’s hard to come back from bad publicity (i.e. pipeline leaks) and preventing negative PR is usually less expensive than trying to win back the public’s opinion.

#3 — What are the frequency of failures?

Mechanical components wear out faster than electrical ones and it may be beneficial to focus investments on moving the parts and sensors. If the Mean Time Before Failure (MTBF) is low, a regular schedule maintenance program may help to prevent unexpected downtime.

#4 — What preventative measures do you have in place?

Equipment failures can be prevented by regular service & maintenance and also by well-trained employees who can spot trouble before it happens. Have a training program in place to share knowledge and experience amongst those on the front lines of production.

Redundancy and reliability engineering are specialized fields in manufacturing control systems. Knowing what could possibly go wrong and the severity of the consequences shows good foresight, strong planning and responsible engineering.

Greg Lynch is a professional engineer with over 18 years of experience in control system design, manufacturing support and reliability engineering design. He is based in Kelowna, BC and operates the engineering firm Industrial Control Systems Engineering Corp. Learn more at www.icsenggroup.com.