There are various types of root cause analysis but they all aim to identify the underlying reasons for an incident. The goal is not just to address the immediate causes of an event but to dig more deeply and to identify the fundamental issues that allowed the incident to occur. By identifying and addressing these root causes, organizations can prevent similar incidents from happening in the future.
However, there are difficulties with the very concept of root cause analysis. In particular, the term ‘root cause’ means different things to different people, so it is difficult to come up with an agreed-upon definition.
An 800 person forum comprised of Root Cause Analysis (RCA) practitioners from all over the world tried to define “Root Cause Analysis.” They could not agree on an answer. . . . It means different things to different industries – even different things within the same industries. It is even difficult to find consistency within the same companies, or even sites within a company.
(Nelms, 2007)
One reason for the difficulty to do with defining the term ‘root cause’ is that it is not possible to find the true, fundamental cause of an event. Every event has one or more causes. These causes are themselves events which have their own causes, and so on and so on. The chain can regress infinitely, both in depth and width.
To add to the difficulties, different people have their own preferred line of causation.
Let’s say that a pump seal has failed. One investigator may note that the wrong type of seal was installed. Therefore her root cause trail will examine the company’s purchasing and procurement procedures. At the conclusion of the investigation she may define the root cause of the failure as ‘Limitations in the enterprise resource software’.
Another investigator may find that the maintenance technician who installed the seal had not been provided with accurate procedures; nor had he received training for the installation of this type of seal. Therefore this root cause trail will scrutinize the process for writing procedures and for making sure that people are properly trained in the use of those procedures. His definition for the root cause of the failure may be ‘Failure to write adequate maintenance procedures and to properly train maintenance technicians’.
A third investigator may note that the process liquid in the pump does not have the same composition as used from the original design. He or she may then develop a root cause trail to do with materials failure caused by the liquid change, resulting in a root cause, ‘Management of Change system provides inadequate guidance regarding material integrity checks’.
So, we have at least three ‘root causes’:
Enterprise software;
Maintenance procedures; and
Management of Change.
Each one of these is a root cause, but none of them are the root cause.
The existence of an indeterminate number of root causes may help explain some of the frustration that is occasionally expressed with standardized incident analysis procedures and software. In spite of their structured approach these systems are fundamentally subjective. For example, one technique helps the investigation team list many of the possible causes that led to an event. Some of these causes are then identified as ‘causal factors’ which are then developed into root causes. Yet the determination as to which causes are causal factors will necessarily depend on the training, experience and opinion of the persons making that selection.
Nelms describes this difficulty:
The problem with Root Cause Analysis is that it has become whatever people want it to be. If you only want to see problems in your "Management Systems," that's all you will see. If you only want to understand the physical mechanisms of problems that is all you will see.
A further potential difficulty with regard to root cause analysis is the danger of drawing general conclusions from inadequate sample sizes. For example, an investigation into the pump seal failure may find that the maintenance procedures for installing that seal were very difficult to follow. The investigator must be very careful, however, about developing a general recommendation such as, ‘Maintenance procedures at the facility require a major upgrade’. It could be that all the maintenance procedures are of high quality with the exception of this one.
The above critique does not mean, of course, that incident investigations should not attempt to find root causes. But it does mean that we are looking for root causes (plural) rather than the root cause.
Fully agree! This is why it is so important to have a diverse team to get the different starting points and past experiences into the investigation.
For me the biggest challenge is to get to the "Loss of Control" (=Management failures) and to move beyond "human failure".
One question: You mention as a soure "Nelmés, 2007" - could you please provide the full reference, I have not been able to find it.
Incidents are big and small, so as an Instrument Technician, when commissioning new installations, I like to simulate the obvious equipment failures and observe the outcomes to see how different they are to the expected outcomes, if it part of a shut-down system. One of the biggest issues is that they Engineers, do not know the specifics about the equipment, do not read the Manufacturers manuals and as such they have missed important details relating to the installation requirements. For years we had a critical valve that would not control properly,, and the engineer was just going to replace it, Got to review the valves operation, to find the positioner shaft broken , the positioner was replaced but I have already seen it in action & it will break again because no one looked at why the positioner broke. Another thing recently was an O2 analyser that controls a furnace , needing major control boards replaced twice in 8 years ,when another 2 unit have had no problemin 15 years and the Vendor will never know about the external relay that is that cause of failures and again the Engineer will do nothing to understand why other than blame the equipment. So for instrumentation, it needs multiple Engineer disciplines to understand why something failed , in the case of the control valve it is most likely a Mechanical issue that Electrical Engineer will not see.