Last year we published the book The OSHA Process Safety Standard: The 30-Year Update. In it we stated that one reason is updating their PSM rule is in response to the findings and recommendations of the Chemical Safety Board (CSB).
The CSB’s investigations have also led to changes in State process safety regulations. Today, in their bulletin CSB Applauds Washington State’s New Refinery Safety Regulation (January 5, 2024) the CSB says,
Following the CSB’s investigation of the April 2, 2010, catastrophic explosion and fire at the Tesoro Anacortes Refinery in Anacortes, Washington, that resulted in seven fatalities, the CSB issued three recommendations to the Governor and Legislature of the State of Washington to strengthen Washington’s Process Safety Management (PSM) program.
The State of Washington has updated their PSM regulation; there are three elements to the changes.
Damage mechanism reviews,
Hazard control analyses hierarchy, and
Root cause analysis after significant accidents.
The last of these items ― root cause analysis ― is the theme of this post.
Type of Root Cause Analysis
Root cause analysis tricky. In our book Process Risk and Reliability Management we suggest that there are at least four types of root cause analysis:
Argument by analogy (story-telling);
Barrier analysis;
Categorization; and
Systems analysis.
This means that the first decision that an incident analysis team has to make is deciding on which approach to follow.
But our difficulties don’t stop there.
The Meaning of ‘Root Cause Analysis’
The term ‘root cause’ suggests that there is a single factor that, if removed, would prevent the recurrence of all incidents similar to the one being investigated. But the term ‘root cause’ means different things to different people, so it is difficult to come up with an agreed-upon definition.
An 800 person forum comprised of Root Cause Analysis (RCA) practitioners from all over the world tried to define “Root Cause Analysis.” They could not agree on an answer. . . . It means different things to different industries – even different things within the same industries. It is even difficult to find consistency within the same companies, or even sites within a company.
Nelms 2007
A Multitude of Causes
One reason for the difficulty to do with defining the term ‘root cause’ is that it is not possible to find the true, fundamental cause of an event because each event has one or more causes. These causes are themselves events which have their own causes, and so on and so on. The chain can regress infinitely, both in depth and width, thus creating what has been referred to as the ‘Root Cause Myth’ (Gano 2007).
Moreover, different people have different perceptions as to the causes of events. For example, if a pump seal fails, one investigator may note that the wrong type of seal was installed. Therefore her root cause trail will examine the company’s purchasing and procurement procedures. At the conclusion of the investigation she may define the root cause of the failure as, ‘Limitations in the enterprise resource software’.
Another investigator may find that the maintenance technician who installed the seal had not been provided with accurate procedures; nor had he received training for the installation of this type of seal. Therefore this root cause trail will scrutinize the process for writing procedures and for making sure that people are properly trained in the use of those procedures. The investigator’s definition for the root cause of the failure may be, ‘Failure to write adequate maintenance procedures and to properly train maintenance technicians’.
A third investigator may note that the process liquid in the pump did not have the same composition as used for the original design. He or she may then develop a root cause trail to do with materials failure caused by the liquid change, resulting in a root cause, ‘Management of Change system provides inadequate guidance regarding material integrity checks’.
So here is the dilemma:
Root causes do not exist. Were merely have events which have causes, which in turn become events that have causes, and so on and so on.
Root cause analyses are profoundly subjective.
The Washington State Rule
The new Washington State rule to do with root cause analysis is referenced in WAC 296-67-363 Incident investigation – Root cause analysis. It contains the following language.
. . . the report must also include a method for performing a root cause analysis.
We see that the rule does recognizes that there are different types of root cause analysis, but it does not describe what they are. The rule also takes it for granted that root causes actually exist.
Conclusions
The above critiques do not mean, of course, that incident investigators should not attempt to find root causes. But it does mean that they are looking for root causes (plural) rather than the root cause (singular). The incident team should include people with different backgrounds within the organization so as to minimize the inevitable subjectivity that is part of any root cause analysis.
The goal of most root cause analyses is to generate recommendations that have generic application. In the case of the pump seal failure, it is not enough to prevent that particular seal failing again, the analysis should help reduce the frequency of all seal failures.
If that goal of making a generic recommendation is achieved then the team can plausibly claim to have found at least one root cause, particularly if that recommendation can be implemented within the bounds of operational and financial realities.