Without definitions, one could confuse troubleshooting, problem solving or root cause analysis, and no one would be the wiser. The following definitions will add some clarity.
There are many methods of problem solving out in the world. They range from a home-grown process within a company to various vendor methodologies. The methodologies range from very easy to use to somewhat complex. They also range in their ability to eliminate recurrence.
With this said, let’s break it down just a little further. Some methods are called problem solving and some are called root cause analysis (RCA). This is where the terminology starts to need some definitions. Without definitions, one could call it troubleshooting, problem solving or root cause analysis, and no one would be the wiser. The following definitions will add some clarity:
- Troubleshooting is a process of elimination (trial and error) — eliminating potential causes of a problem.
- Problem solving is the systematic search for the source of a problem, so it can be solved.
- Root cause analysis is analyzing problems down to their latent root causes, which are the deficiencies in management systems and restraining cultural norms that allowed the failure to occur.
It can be determined that troubleshooting is neither problem solving, nor root cause analysis because it is not a systematic approach. Troubleshooting is a form of trial and error, and its ability to solve problems is solely dependent on the troubleshooter’s skill and experience with the problem.
Problem solving fits the blueprint for continuous improvement, but leaves the depth or shallowness of the investigation up to the user. Problem solving tends to stop too early in the investigation process to eliminate a problem’s entire failure mechanism. In many cases, problem solving does identify the physical root cause of a problem, but it is not designed to uncover latent system issues.
Root cause analysis analyzes a problem to a depth in which the physical, human and system deficiencies are exposed for resolution. This depth will not only eliminate a problem’s recurrence, but the corrections can be leveraged to other areas where the same system problems exist.
Determine the Problem-Solving Mission
Most organizations don’t bother to compile a mission for the problem-solving process because they believe the mission should be apparent. For the organizations that do, they usually wrap it into some other program requirement, such as continuous improvement.
Continuous improvement is a term used often in the problem investigation world as a part of the mission or as the mission. As with many other definitions, continuous improvement has many interpretations. When a problem is solved, it is often considered to have met the continuous improvement mission. But, how often does the same problem repeat at some later date and does it still qualify as continuous improvement?
Improvement depends on what the definition of problem solving is to the organization. Many problem-solving methods meet continuous improvement interpretations by simply returning to an uninterrupted work process and prolonging the return of the problem to another time.
Are incremental continuous improvements really encouraging employees to just make problems a little better and settling for mediocrity?
Problem elimination is a more in-depth problem-solving mission, and the interpretation is clear. The failure mechanism must be identified and eliminated, so there is no chance of recurrence. This also meets the continuous improvement definition, but on a quantum improvement basis rather than an incremental improvement basis.
There is room for both definitions when problems are broken down into a two-track approach to failure avoidance. A two-track approach is proactive because an opportunity analysis tool breaks problems down into two categories: “significant few” and “random many.” Significant few problems are the 20 percent that cost 80 percent of the dollars spent for repair, and the random many are the rest of the problems accounting for only 20 percent of losses. Significant issues should always be solved for elimination, and all other problems solved for incremental continuous improvement gains. What cannot be eliminated should be prevented and/or fail-safe. Oftentimes problem mechanisms can be eliminated, but only if one takes the time to do an in-depth RCA investigation.
Problem Solving for Elimination
Problem solving methods are available for purchase and are comprised of a step-by-step methodology for problem resolution.
Some are as simple as to ask the question “why” five times, and you have reached the cause. Some are about sitting at a table as a group or team, and “brainstorming” the reasons a problem occurs and leaving with a solution to implement, and others require rigorous cause-and-effect analysis “logic trees with hypothesis verifications,” which uncover several levels of root causes.
No matter which method is adopted by an organization, it will take more than just the method to be successful at solving problems, especially when the mission is to eliminate recurrence.
The first consideration is: Are the support systems that are needed in place? Problem elimination means an in-depth investigation, and when conducting an in-depth investigation, technical support for proving and disproving hypotheses is necessary. Other support, such as providing the time and resources to perform an investigation, must also be considered.
Is there a standard for evidence or data collection, and do all employees know what to do after an incident has occurred? In many cases, they do not, therefore creating a barrier to problem solution. Employees are very rarely trained to do root cause analysis: What they are trained to do is fix the problem, discard the failed components and start up the equipment. The discarded data may never be recovered and a successful analysis becomes near impossible.
Does the investigator know and follow the investigative methodology? Most methods are designed to get to the correct answers, but they are seldom practiced as designed.
The investigator’s ability to read fractured component surfaces for electrical and mechanical components speeds up an analysis because they are not waiting for a third-party analysis to be completed. Basic knowledge in this area will be successful in solving approximately 80 percent of the analyses conducted. A method alone cannot provide this internal knowledge.
Solid understanding of why humans make mistakes is necessary when hypothesizing about human involvement in an incident. Human error is manageable when the human error drivers are understood. This knowledge again cannot be acquired from a method alone. Identifying deficient management systems or latent root causes will uncover what drives human error.
When the investigator is a practiced “true lead investigator” supported by management, the success from the elimination of significant problems will be added to the bottom line and that’s where success is measured.