Create a free Manufacturing.net account to continue

A Practical Approach to Reliability-Centered Maintenance

RCM is not complicated, but requires groundwork and the understanding that it is an ongoing process, not a one-time project. Most writers on the subject of Reliability-Centered Maintenance (RCM) emphasize that it includes a systematic approach to determining what must be done to ensure the reliability and availability of physical assets in a production setting.

RCM is not complicated, but requires groundwork and the understanding that it is an ongoing process, not a one-time project.

Most writers on the subject of Reliability-Centered Maintenance (RCM) emphasize that it includes a systematic approach to determining what must be done to ensure the reliability and availability of physical assets in a production setting. However, there is much more to RCM than determining what makes a piece of equipment reliable. Determining what to do and doing it are different things.

The execution of successful RCM entails components like: the implementation of results in the CMMS and other plant databases; the actual execution of the work; and the continuous improvement of the program developed during the RCM analysis. These ultimately define a work-management process for achieving the business goals of a facility.

Determining business goals

To set the stage for an RCM study, the first thing plant management must do is identify business goals. These are tied to 1) daily production quantities and targets; or 2) the reduction or elimination of adverse health, safety and environmental (HSE) impacts. To direct all plant activities toward these two major areas, plant management or senior management within the corporation has to ask and answer questions regarding the perceived corporate goals and whether they are achievable.

Our suggested approach to getting consistent answers is to produce a criticality matrix. This can be simple or difficult, depending upon the industry. Each U.S. refinery, for example, must do a hazardous operability study that uses a corporate risk matrix outlining health and safety impacts, and may include environmental, production and other elements such as maintenance cost reduction. This can serve as the basis for developing a criticality matrix.

Whatever the industry, though, plant management already knows what its plant is designed to produce and the throughput for which it was designed. If management pushes a plant designed to process 200,000 barrels a day to process 210,000 barrels a day, for example, it's difficult to minimize HSE events and meet the increased demands. Unless extreme care is taken to update the plant's maintenance strategy, something must give. A criticality matrix reveals such conclusions in plant-specific and equipment-specific terms.

When creating a criticality matrix, analysis often uncovers incorrect assumptions made by plant personnel. That was the case at one plant that was just beginning RCM. Initial analyses began with a close look at an integrated production line of 15 machines with conveyors between them. The production engineers operated this line as though everything was critical. As a result, whenever anything failed, they immediately scheduled maintenance, often in near-panic mode.

In reality, nothing on that line was production-critical. The plant ran five days a week, 16 hours a day. If something failed on Monday, maintenance could be performed on Monday night or Tuesday. The company would still make its required deliveries on the following Monday.

Be aware, however, that having no production-critical equipment does not mean a plant should not take steps to reduce the cost of maintenance. Neither does it imply that other items are not health, safety or environmentally critical. In fact, at this plant, a few items were safety critical: unguarded moving components and non-existent safety interlocks, for instance. These were remedied, but understanding that nothing on the line was production-critical proved a revelation to personnel.

In preparing for RCM, once the owner/operator sets the business goals and before any RCM study is performed, the maintenance strategist needs to produce supportable scenarios. These should discuss how the conduct of this unique RCM study will affect the achievement of business goals. In this way, there is a mutually understood expectation of what the likely impact of the RCM study will be before it starts.

Once the study is concluded, the outcomes implemented and the revised maintenance regimen executed, it will be possible to compare what is happening to what was expected. Thus, the business case for the RCM reflects the pre-established business goals.

A logical, repeatable process

An RCM process entails several steps. After identifying business goals, identify functions, functional failures, failure modes and effects. Next, build the tasks, which is the creation of the maintenance program. All of this is repeatable. It's a set process, a kind of flow chart or thought process for plant personnel. If the failure of a piece of equipment does not have a negative impact on HSE or the production target, according to the criticality matrix, that equipment is not critical.

When RCM is correctly done, everyone in the plant will have an understanding of what's critical and what's not, because they all use that same process to determine criticality. Having determined what's critical, plant personnel, usually with the help of experts, can use a software program to help them determine maintenance standards based on the failure effects, failure modes and failure causes.

For example, if the task is to analyze a motor, the program will show plant personnel the motor's typical failure modes as well as its typical failure causes and maintenance standards, based on the motor's criticality. The process is consistent for motors of a given size and type, and is repeated for each piece of equipment in the plant. This is a monumental task if you don't use the right RCM technique. Identification codes for equipment are developed and recorded into any leading CMMS for continued, consistent use.

After a plant builds a list of critical and non-critical equipment (and an understanding of how each fails and why), maintenance standards can be assigned for both critical and non-critical equipment. In the process, plant personnel decide which non-critical pieces can be run to failure because they are not sufficiently important and no financial incentive exists to maintain them. Among the tasks that come from these steps are those to be handled by operators, as well as preventive, predictive and condition-monitoring tasks.

Operator tasks might include looking for and recording flows and pressures.

Predictive tasks include vibration monitoring and oil analysis.

Condition monitoring includes monitoring pressure drops across filters, but might also include monitoring pressures and flows on-line via the DCS (distributed control system).

The next logical step in the evolution of computerized asset management systems is to include knowledge management functions to improve decision support. These approaches are now commercially available through decision-support systems which use fault-tree logic to establish rules and actions for fault recognition and correction.

After data is collected through RCM-prescribed maintenance tasks, automated notification is sent which includes specific corrective actions as well as severity and confidence indicators for the fault.

Beyond preventive maintenance

Here's another way to describe RCM: Doing the right maintenance to the right equipment at the right time, knowing what it means and learning from it perpetually. RCM requires a combination of time-based (lubrication), predictive (vibration), condition-based (changing a clogged filter) and repair tasks. Thus, within RCM, the entire maintenance strategy is a combination of letting things fail that you know are not important, then doing preventive, predictive, condition-based and repair tasks in the appropriate combination for the important items. Preventive or time-based tasks, which was the basis of most proactive maintenance until the 1950s, assumed that failures can be prevented with regular service based on calendar time or running time measured by a meter. RCM experts now recognize that most failures thought to be time-based are random or externally caused, and that time-based tasks don't work for random failures. Overall, studies suggest that between 77% and 92% of machine failures occur in a manner that is not time-related. Therefore, time-based preventative maintenance is likely to be ineffective.

Nevertheless, many maintenance programs are still time-based. Many companies perform time-based maintenance because they can't get far enough ahead in their programs to initiate other types of programs. One reason for this is that some perceive maintenance activities as unpredictable and non-repetitive, thereby rendering them unsuitable for systemization. However, improved capture of maintenance data facilitated by information technology systems and coupled with better implementation techniques predictive maintenance (PdM), total productive maintenance (TPM), operator-driven reliability (ODR) are changing this perception. An event that may not appear to be normal but that occurs at regular intervals (such as a motor failing every six months) can still be classified as a normal event. About 80% of maintenance decisions can be regarded as routine when one is guided by common sense and good engineering practice tempered with past experience.

Root-cause failure analysis

If you're running an RCM program and have recurrent failures for which there are no apparent causes, conduct a root-cause failure analyses (RCFA). If, for example, the troublesome component is a pump, root-cause analysis may show that it is under capacity for the application. Then one would likely redesign the process and put in one bigger pump or an additional smaller pump to handle the application.

In a mature RCM program, RCFA plays roles beyond pointing to the need for a redesign. Here, tasks are getting done, reports are getting filed and equipment histories are available for analysis. At this level, past work orders and equipment histories can be viewed and analyzed. One might determine, for example, that even though seals on pumps generally last about four years, one pump had four seal failures in the past year. Such situations call for an RCFA.

For every kind of equipment, one can set limits within the CMMS requesting that the system flag situations in which a piece of equipment had more than x failures of type y. RCFA is also called for when an event adversely affects a business goal. Such situations include those in which there is an environmental release, an injury or a big production stoppage.

It's important to remember, however, that different managers and companies often have specific approaches to root-cause analysis. It's critical that only one approach becomes part of the everyday work-control process. To be truly effective RCFA by whatever method or process must move beyond being "event based" and become "pattern/sequence based." If events are addressed and solved in isolation without leveraging into learning and knowledge, RCFA is largely reactive.

Testing for performance, function

Several types of tasks result from a comprehensive RCM program, including performance testing and functional testing. The idea behind performance testing is to determine when it is best to replace or rebuild equipment rather than continuing to run it. With a pump, for example, there's a pump curve. There's an optimum part of the curve where the pump will provide optimum performance. There are areas below and above that optimum. A pump running outside the optimum will wear out prematurely. To get maximum throughput, the pump was designed to pump at, say, 500 gal. per minute. However, somewhere along the line, the components inside the pump ceased to remain within specifications. Perhaps the impeller corroded. In such cases, to meet throughput requirements, someone might ramp up the motor to make the pump work more.

At this point, the process engineer should ask for a performance test. Then, someone will record the pump's inlet and outlet pressures and the amount of power consumed to drive it. From such values it is possible to calculate the pump's performance curve.

Functional testing differs from performance testing. Consider a system in which a large pump is always used unless there is a pressure drop, at which time a pressure switch stops the primary pump and automatically starts a backup pump. The most important piece of equipment on that two-pump skid is that little $5 low-pressure switch. If it doesn't work, it's as though the backup pump isn't even there, because the backup won't start. Functional testing entails making sure the pressure switch is operating properly and that the backup pump will operate.

In some plants, the primary pump may run for a year, while the backup pump remains idle. By contrast, a functional test performed monthly might entail simulating a primary pump failure to ensure that the resulting pressure drop triggers the switch and activates the backup pump without an interruption of service. If the functional test fails, there is still time to correct the problem before an actual failure of the primary pump might occur.

A continuous-improvement process RCM is a program, not a project. At the end of a time period perhaps one year revisit what has happened and determine how close results were to expectations. For instance, if expectations were that one specific critical piece of equipment would fail every three years, and it failed five times in the past year, there is a problem. Either the maintenance program is off base and the tasks related to that equipment should be reevaluated or another factor is at play. There are many tools for such situations, from root-cause failure analysis to a complete reevaluation of the maintenance program.

Also, when a business changes, production requirements may change and dictate changes to a maintenance program. Then, a reevaluation of maintenance requirements is in order. Perhaps the decision will be to decrease maintenance. Perhaps the risks will be viewed as significant enough not to change maintenance procedures or frequencies.

The point is that RCM that works requires continual evaluation to ensure its effectiveness. RCM that works is an integral part of any continuous-improvement process. It is about making a sustainable change within a maintenance organization. It is doing the right thing for the right reasons, combined with effective management of change that brings results. Further, it is important to realize that reliability-focused maintenance is a living dynamic process and is thus more than the cursory application of technologies or strategies.

At the same time, RCM is a process that can yield huge quantifiable benefits. One major chemical company, for example, documents RCM-related maintenance savings at $147 million per year. Savings stem from reduced intrusive engine and compressor maintenance, reduced compressor valve maintenance and condition-based running maintenance.

The order of magnitude of the overall operational and/or financial benefits of implementing RCM will vary from industry to industry, and plant to plant. However, the benefits will be significant in all except those that already possess exceptional uptime records and maintenance procedures.

Michael E. Creecy is a vice president at SKF Reliability Systems, a San Diego, CA-based customer-support business of SKF, a global supplier of products, solutions and services in the business of rolling bearings and seals. Creecy has nearly 20 years experience working with companies in a variety of industries to help them conduct successful RCM studies and implementations. He invites IMPO readers to contact him at [email protected].

More