Predicting The Future

Five steps to implementing an effective predictive maintenance program.

Dec 23, 2016

The benefits of predictive maintenance for capital equipment suppliers are numerous — and significant. Predictive maintenance relies on the use of sensors embedded in equipment generating large amounts of performance data, connectivity, advanced data analytics, and remote diagnostics to predict and prevent equipment breakdowns. By increasing equipment uptime, suppliers improve productivity of their products while reducing spare parts usage and service and warranty costs. In turn, these improvements can drastically increase revenues, profits, and the company’s market share.

Many factors that affect machine productivity can be improved through an effective remote monitoring and predictive maintenance program. These factors include breakdowns, setup and configuration time losses, reduced throughput (and higher cycle time), defects and rework, and start-up losses. Yet with so much to gain, many capital equipment suppliers struggle to implement predictive maintenance programs.

Just thinking about predictive maintenance can raise questions that you don’t have answers for or know where to look for the answers.

Where do we start?
What steps do we need to follow?
How do we know we’re on track?
How do we measure success?

It is easy to understand your apprehension, especially when one considers the rapidly transforming landscape of manufacturing. There may be hundreds, even thousands, of pieces of equipment across multiple generations of the product deployed to the field, running in various customer facilities around the globe generating a multitude of error codes, alarms, failures and anomalies.

However, without a predictive maintenance program, the resolution of failures tends to be ad-hoc, non-systematic, completely manual and a time consuming project that yields sporadic results. In this article, we will outline a five-step methodology for implementing an effective predictive maintenance program. As we outline the five steps, we will share our experience working with a large global capital equipment supplier, we will call ACME Machine Tool, to provide a real-world application of our process and the results that can be achieved.

1. Identify Significant Failure Scenarios and Plot Incidents of Equipment Failure

Pareto charts are often used by capital equipment suppliers to identify significant equipment failure modes. This is a valuable tool in helping to isolate the most common failure modes and prioritizing what failures to focus on to resolve problems.

The first step in a predictive maintenance program is creating a Pareto chart, which is the basis for creating a much more powerful, real-time predictive model. It is about investigating and analyzing the most severe and costly failures of the equipment and defining the scope of information that needs to be collected to understand the failure scenarios. This deep dive into the potential causes, history, sensor data, measurements, other data sources, etc. could provide information that is associated with the problem.

Deep Dive Analysis Finds Motor Failure Causes Severe Downtime

ACME Machine Tool was experiencing random motor failures causing severe machine downtime. Using iterative predictive maintenance, we found that if we could predict when the motor would fail, we could schedule service during a shift change to avoid a significant downtime incident and improve productivity for the end-user of the equipment. Determining the root cause might avoid service and complete parts replacement in the future.

We began by examining the conditions and data surrounding incidents of the motor’s failure. We established parameters to define the scope of data to examine as well as how we would create hypotheses of potential causes. This effort allowed us to understand what data we were looking for, what equipment sensors we had in place to collect it, and what additional data we would need for further analysis.

2. Let Historical Data Be Your Guide

In this step of the process, you should begin to collect as much of the identified data as possible. Here, more information is better because it is generally too early to know exactly what you are looking for. However, data with timestamp information is especially valuable because future analysis will attempt to correlate when the motor failed with other variables. Your organization should also create a failure event database. Rules to predict failures that are developed and deployed will depend on performance and operational data being stored in a location and format that can be quickly accessed and analyzed. A database that is designed to do this should be used. And the data that has been cleaned and prepared should be entered into this database.

Log Files Shed Light on Failures

For ACME, we began by pulling all of the log files in and around its particular motor failure scenarios. While this information was not specific to just the motor, it contained valuable clues as to what was causing the failure. We got the job done quicker by semi-automating the process of parsing the log files.

We pulled multiple log files into the failure event database and began to synchronize them in order to uncover potential sequences of events that may have led to the failure. Finally, we checked all data in the files for consistency and made use of the full scope of information they represented.

3. Analyze Multiple Incidents

After gathering the data and getting it into a form that can be used for automated analysis, as a capital equipment supplier, you should examine everything. Specifically, weigh the evidence with the following questions in mind:

What happened physically?
What happened before the failure?
What happened after the failure?
How much total time should we examine to make sure we incorporate all of the variables that could have caused the incident?

To best answer these questions, multiple occurrences of the failure–as many as possible–must be captured and analyzed. As all of this information comes together, it may be possible to identify causality. This is the first step in shining a light on the possible cause and effect.

Motor Degradation Identified

When we asked these questions of ACME, we found ourselves examining data that was specific to each failure. Doing so, we began to suspect the motor was failing as a result of friction-induced “following error,” its failed attempts to keep up with velocity commands from the controller. We discovered that ACME had data related to motor degradation, data it had not analyzed before now. In this case, knowing the equipment and the physics behind it were critical to linking events and beginning to predict the failure mode.

4. Search for Patterns

This is the point where you should search for patterns and attempt to confirm the leading symptoms. This is where you can use decision trees or other statistical methods and tools to begin to find the cause. It’s important to focus on the patterns and attempt to find the most likely predictors. Constant iteration and encouraging “new ways of thinking” are especially helpful.

Motor Patterns Reveal Potential Causes of Failure Scenario

Examining patterns for ACME, we isolated a number of symptoms and used decision trees to generate probable causes of the failure scenario. We examined the speed of the motor, the time it took for the motor to settle, and the motor’s voltage and current. Most importantly, we looked at these same data points from many different incident occurrences to see what conclusions could be drawn. We saw that many of the motor failures were proceeded by a gradual rise in motor temperature as well as some indications of performance issues with the speed of the motor.

5. Confirm the Leading Symptoms, Develop the Business Rules, and Deploy

In this step, you should test the symptoms against the known physical characteristics of the system to ensure that they make sense from a physics and engineering standpoint. Next is testing of the patterns and symptoms using historical data to confirm the predictability of the failure scenario across multiple incidents. The goal is to understand whether or not this data set would have predicted the failure. For example, if the failure occurred 20 times, but the model would have only predicted 10 failures, the model may not be sufficient. You should analyze many variables and many variable combinations as you determine how to increase the hit rate.

Next, develop business rules that will predict the failure in real time by implementing the predictive model within your Internet of Things (IoT) platform/infrastructure. This action implements the failure prediction that was derived in the analysis conducted in the previous steps. The assumption here is that you already have an existing IoT platform/infrastructure into which the business rules can be implemented. The creation of such a platform is critical to deployment of real-time predictive maintenance.

Analysis Confirms Predictability of Motor Failures

Through regression testing, we were able to correlate a large percentage of ACME Machine Tool motor failures with a combination of increased following error (i.e., the inability of the motor to keep up with where it is supposed to be) and increased motor current over time. As it turned out, an internal defect in certain motors, under certain running conditions, was causing internal friction leading to the malfunction. Business rules were set up in the IoT platform to detect the appropriate trends, and through this method, motors were replaced before failure. Additionally, the supplier of the motor provided a fix that removed this failure mode from its product for all future motors.

ACME Machine Tool implemented a predictive maintenance program to use data from equipment sensors and other sources to anticipate machine failures. By moving through a step-by-step methodology including advanced statistical analytics and data-mining techniques, ACME was able to successfully predict equipment failures.

Progress Toward Prediction – the Predictive Maintenance Advantage

Once this five-step methodology is successful in predicting and preventing a high percentage of the most egregious failures then your organization should move on to the next most egregious failure and iterate through the same steps. An iterative predictive maintenance program such as this creates long-term value for you and your customers through reduced costs, improved productivity, and increased market share.

With predictive maintenance, you can improve machine uptime, increase production, and reduce overall costs. These benefits can mean increased revenues, a stronger competitive advantage, and a healthier bottom line for your business. It has been documented that predictive maintenance can reduce unexpected equipment failures by 55 percent and increase uptime by 30 percent according to Keith Mobley’s book Plant Engineer’s Handbook.

It’s clear that you can invest time and money in a predictive maintenance program now or be forced to deal with the more expensive consequences later. Don’t wait until later. The manufacturing landscape is evolving rapidly and for capital equipment suppliers looking for ways to differentiate, we know that predictive maintenance can be your distinct competitive advantage — today.