ML22277A013

From kanterella
Jump to navigation Jump to search
Investigation of the Use of Causal Analysis Based on System Theory (CAST) at the NRC
ML22277A013
Person / Time
Issue date: 09/30/2021
From: Mauricio Gutierrez
NRC/RES/DE
To:
Gutierrez M 301-415-1925
References
Download: ML22277A013 (17)


Text

1 INVESTIGATION OF THE USE OF CAUSAL ANALYSIS BASED ON SYSTEM THEORY (CAST)

AT THE NRC September 2021 John Thomas1In collaboration with the U.S. Nuclear Regulatory Commission Sushil Birla, Bernard Dittman, and Mauricio Gutierrez Contract Officers Representative: Mauricio Gutierrez

  • E-mail: jthomas@advancedengineeringservices.org

2

1. Abstract Causal Analysis Based on System Theory (CAST) is an accident/incident analysis method based on systems theory and the STAMP (System-Theoretic Accident Model and Process) model of accident (loss) causation. CAST defines a systematic approach to learning from losses and events with the potential to cause losses. These include complex and systematic causes related to human behaviors and decisionmaking, inadequate procedures, erroneous operating or engineering assumptions, common causes that defeat diversity, gaps in engineering and assessment methods that overlooked common causes or considered them only superficially with incorrect assumptions, inadequate management and oversight, safety culture, and other systemic causes that lead to losses. CAST is designed to minimize cognitive biases during an incident or accident investigation and identify more effective solutions.

This project explores the learnability and suitability of CAST to support investigations or analysis of events in the nuclear industry and to more broadly understand the potential of STAMP-based methods. A series of five engagements introduced staff members of the U.S. Nuclear Regulatory Commission (NRC) to CAST and elicited feedback on the learnability limitations experienced by the NRC. The five engagements included a CAST seminar series, two CAST workshops, and two leadership seminars to introduce and present preliminary results to selected NRC managers.

Key findings from this investigation are the following:

The NRC staff participants demonstrated the ability to learn the concepts underlying CAST.

The NRC staff participants demonstrated the ability to use CAST to analyze nuclear industry events and identify causes that were overlooked by teams using traditional methods.

The NRC staff participants found that CAST is a suitable complement to existing regulatory activities and would be beneficial in analysis of operating experience events, licensee event reports (LERs), investigations, and inspections.

Introduction of such techniques in the NRCs processes will require a more proactive approach and investment.

2. Introduction This report documents the findings of a collaborative NRC-sponsored investigation on the use of CAST at the NRC. CAST [1] is an accident/incident analysis method based on systems theory that identifies systemic factors that are often overlooked in traditional LER and root cause analysis (RCA) approaches. The goal of this project was to evaluate the potential regulatory use at the NRC and potential barriers to using CAST for nuclear safety systems, event reports, independent RCA, safety analysis reviews, and other areas of potential interest to the NRC. The primary motivation for the investigation was to improve regulatory practices pertaining to safety-related digital instrumentation and controls (DI&C) through learning from operating experience. Secondly, licensing staff who were interested primarily in learning System-Theoretic

3 Process Analysis (STPA) may have the opportunity to learn about its foundationSTAMP more easily in the context of CAST.

The NRC project manager invited over 60 NRC staff members to participate in a series of CAST seminars, workshops, and other engagements to introduce the CAST process, perform group exercises, review existing applications and results, evaluate the potential regulatory use of CAST at the NRC, and collect feedback from NRC staff. NRC staff feedback and evaluations examined aspects of CAST learnability, usability, potential NRC benefit, and whether CAST can practically address critical NRC needs.

3. Principal Findings The following are the principal findings:

The NRC staff demonstrated the ability to learn the concepts behind CAST.

The NRC staff recognized the potential benefit of using CAST in regulatory oversight, learning from operating experience, and using the learning to improve its regulatory practices.

Introduction of such techniques into the NRCs processes will require a more proactive approach and investment.

For best results, a small team should consist of participants with diverse backgrounds, knowledge, and experience.

The NRC participants unanimously reported that they understood the topics covered in the CAST sessions, which was reinforced by their performance of facilitated CAST exercises. The NRC participants also unanimously reported that the agency could benefit from performing CAST and that CAST would enable the NRC to do its job more effectively.

The key learnability and capability-building findings are as follows:

CAST learning and capability-building require hands-on practice with supervision or facilitation by a qualified instructor. Reading materials and standalone presentations may provide limited familiarity, but they do not enable proficiency without hands-on experience.

Hands-on experience with real-world cases, including information about systematic causes, is important for effective learning.

Extended question and answer sessions and open discussions with an expert CAST facilitator helped clarify key points and removed potential confusion.

Learning can be inhibited if participants do not make a time commitment (e.g., they skip sessions). About a third of the NRC participants were unable to commit to full attendance because of scheduling conflicts. NRC participants who could attend all sessions were able to meet the learning objectives. A few participants reported confusion because they missed a lecture or discussion point, and those points had to be revisited to clear the confusion.

4 The instructor observed no significant learnability barriers for the NRC staff who participated throughout the learning sessions. The principal challenge noted by the instructor was in accommodating multiple staff scheduling conflicts and enabling staff attendance at all CAST sessions. The NRC staff members who had experience or familiarity with past events and investigations were able to learn CAST principles more quickly than staff who lacked that experience.

The NRC staff observed that the agency needs additional learning opportunities to realize the benefits of CAST:

A larger scale effort, representative of DI&C issues encountered in real-life events, is needed to further develop NRC staff skills to use CAST for deeper learning from operating experience.

The following specific next steps are recommended:

Select NRC projects that would benefit from CAST.

Identify NRC participants that should be engaged.

Build NRC staff skills to the level needed for analyzing real-world events, enabling deeper learning than in current practice and identification of improvements over current practice.

4. The CAST Learning Sessions NRC staff members participated in a series of CAST sessions:

CAST seminar series (two sessions)

CAST workshops (two sessions)

Leadership Seminar 1 Leadership Seminar 2 4.1. CAST Seminar Series The seminar series introduced the principles and foundations of CAST, including STAMP, and the CAST process itself. The goal was to provide a basic understanding of the process, to evaluate the learnability of the process, and to equip the NRC participants with the capability to (1) evaluate the potential benefit of using CAST in regulatory oversight, learning from operating experience, and using the learning to improve its regulatory practices and (2) identify the potential barriers to its effective use at the NRC.

The CAST seminar series began with case studies in nuclear and nonnuclear industries to demonstrate modern challenges in accident and incident analysis, common fallacies and points of confusion that make these challenges seem intractable, and to demonstrate effective models, solutions, and lessons learned from past accidents (loss events), human factors, and engineering mistakes.

The seminar series explained the CAST process using real-world examples and interactive exercises. Participants learned how the CAST process is used to identify nontrivial human interactions, digital interactions, and automated behaviors that lead to losses and the reasons

5 behind those behaviors. Some examples included discussions about the ability or inability of various methods to consistently identify causal factors, especially causes of human behavior or human errors.

The CAST seminar series included a facilitator-assisted exercise based on a fictional sequence of events and ultimate loss that was constructed from real incidents. The NRC participants were randomly assigned to one of five groups to apply CAST to analyze the incident and identify recommendations and corrective actions. All groups demonstrated the ability to model a preliminary control structure, identify unsafe behaviors, beliefs, contextual factors, assumptions, and systemic factors that tend to be overlooked. The analysis covered control room operations and assumptions, maintenance actions and assumptions, management actions and assumptions, and external factors that were not adequately controlled. The recommendations and corrective actions proposed by the team were compared to typical results without CAST, and the results from NRC participants using CAST were found to address important gaps including systemic factors and incorrect operational assumptions that will continue to persist if not addressed.

At the end of the series, NRC participants demonstrated a basic comprehension of the CAST process.

Although participants were invited from many different areas across the NRC, their backgrounds did not appear to be a determining factor in their ability to learn and understand CAST. The strongest determining factor in participants ability to learn and understand CAST was their level of attendance and participation in the CAST seminar series. A secondary factor was their level of experience with event investigations and systematic causes of past events.

Qualitative and quantitative feedback was collected from NRC participants throughout the seminar series and was used to form the conclusions in this report. For additional information about the feedback collected, see Appendices A and B.

4.2. CAST Workshop Series The workshop series facilitated more intensive hands-on application of CAST by participants.

The goal was to offer additional practice and obtain additional observations about CAST learnability and use by NRC participants.

Working in groups of two to five with an expert CAST facilitators guidance, participants applied CAST to two real events. All groups could consult the facilitator about the CAST method, as needed. The facilitator role was to answer the questions and, after each CAST step, review each groups results with all the groups. At the end of the session, the CAST results from participants were compared with official investigation results and corrective actions from LERs and RCAs. The participants noted that most of the CAST findings were not captured or addressed in the official investigation results, including the underlying causes of unsafe human behaviors, the incorrect operating and engineering assumptions that were made, the gaps in the engineering methods at the time that failed to catch these assumptions, and management and organizational factors. The workshop series ended with a detailed discussion of open questions and insights by participants.

All groups were able to follow the CAST process to identify real causes, including systemic causes and specific reasons for human behaviors, as well as additional questions that should

6 be asked during an investigation. The CAST results produced by NRC participants showed that CAST provided critical new insights, questions, and solutions that were otherwise absent from traditional analysis results such as real LERs and RCAs.

Although all teams were successful in identifying real causes, the most consistent results were produced by CAST teams of about four or more people. This observation is consistent with the guidance in the CAST Handbook [1] that CAST be applied by interdisciplinary teams.

Participants were asked about the muddiest part of the workshop and any lingering points of confusion. This discussion yielded additional insights:

Most participants indicated that the CAST process was clear with no significant points of confusion.

Several participants noted that their confusion was due to missing previous sessions because of unavoidable scheduling conflicts. About one-third of the participants reported that they had to miss some CAST sessions. Formal training classes could address this by mandating attendance and requiring a specific time commitment from participants.

One identified point of confusion was keeping things at the appropriate level. Formal training classes could address this point of confusion by enabling additional practice with abstraction in CAST. This point of confusion is sometimes raised by CAST beginners who are detail-oriented experts and may have less experience with top-down processes like CAST. An effective way to overcome this barrier is to begin with exercises that are outside the participants expertise, such as a nonnuclear example. The level of detail presented in the exercise can be controlled to help participants become comfortable with high levels of abstraction. Additional exercises can gradually transition to applications that are closer to the participants area of expertise as the participants build skill in analyzing an event at high levels of abstraction.

Appendices A and B summarize the results from the participant surveys.

4.3. Leadership Seminars Two leadership seminars were held to brief NRC leadership on the results of this project and enable discussion of the NRC staff conclusions and recommended next steps. The leadership participants included the following:

Office of Nuclear Regulatory Research (RES), Division of Risk Analysis (DRA)

Director, Mark Thaggard Deputy Director, Christian Araguas Senior Reliability & Risk Engineer, Christiana Lui Branch Chiefs from:

Performance and Reliability Branch Probabilistic Risk Assessment Branch Human Factors and Reliability Branch RES, Division of System Analysis, Accident Analysis Branch Chief RES, Division of Engineering, Deputy Director Office of Nuclear Reactor Regulation, Division of Engineering & External Hazards Director

7 The CAST seminar and workshop participants observed that CAST could augment their learning from operating experience for the purpose of improving regulatory oversight of safety-related DI&Cfor example, to seek additional information from the licensee, as needed, to understand safety implications; improve inspection procedures; improve review guidance; and improve regulatory guidance.

5. Limitations The following limitations of this effort have been recognized:

The CAST engagements were limited to NRC staff members who were available and able to participate even though there were competing priorities and no ability to force a commitment.

The participants came from the following NRC groups2:

1. Office of Research
a. Division of Engineering
b. Division of Risk Analysis
c. Division of Systems Analysis
2. Office of Nuclear Reactor Regulation
a. Division of Advanced Reactors and Non-Power Production and Utilization Facilities
b. Division of Engineering and External Hazards
c. Division of Reactor Oversight
d. Division of Risk Assessment
3. Office of the Chief Information Officer
4. Office of Nuclear Security and Incident Response
a. Division of Physical and Cyber Security Policy
5. Region IIDivision of Reactor Safety
6. Region IIIDivision of Reactor Safety Participants included technical reviewers, regional inspectors, researchers, and managers.

The areas of expertise of the NRC participants included the following:

electrical engineering mechanical engineering nuclear engineering probabilistic risk assessment operating experience instrumentation and control cybersecurity information technology The level of experience of the NRC participants ranged from summer hires to 20+ years of professional experience.

2 Instrumentation and control branches are included from the Office of Nuclear Reactor Regulation and RES.

8 The CAST engagements were not designed to replace an extensive CAST training class. The engagements were exploratory, designed to provide a basic exposure to and familiarity with CAST, and intended to enable observations about CAST learnability, applicability, and capability. Additional training would be needed to produce skilled CAST practitioners.

6. Conclusions and Next Steps Conclusions and recommendations were derived from NRC participant evaluations and surveys collected throughout this effort, as well as group discussions and interviews. The principal conclusions are as follows:

The NRC staff was able to learn the concepts underlying CAST.

The NRC staff reported that the agency could benefit from using CAST in regulatory oversight, learning from operating experience, and using the learning to improve its regulatory practices.

For best results, a small team consisting of participants with diverse backgrounds, knowledge, experience, and thinking processes should perform a typical analysis of an event.

The NRC staff observed that additional learning is needed to realize the benefits listed above and identified the following next steps:

A larger scale effort is needed to further develop NRC staff skills to support CAST use in the NRCs regulatory oversight processes.

The staff-recommended next steps include the following:

Select NRC projects that would benefit from CAST.

Identify NRC participants that should be engaged.

Build NRC staff skills to the level needed for analyzing real-life events, learning more deeply than in current practice, and identifying improvements over current practice.

7. References

[1] Leveson, Nancy, CAST Handbook: How to Learn More from Incidents and Accidents, 2019, available at http://psas.scripts.mit.edu/home/materials/.

A-1

8.

Appendix A:

NRC Participant Responses to Closed-Ended Survey Questions Participants from the U.S. Nuclear Regulatory Commission (NRC) reported that Causal Analysis Based on System Theory (CAST) was understandable and learnable, as shown in Figure A-1.

This finding is reinforced by the results of CAST exercises performed by NRC participants.

Figure A-1: Participant evaluation of CAST understandability 0

10 20 30 40 50 60 70 80 90 100 Yes No Overall, did you understand the topics presented in today's CAST Seminar? (Day 1) 0 10 20 30 40 50 60 70 80 90 100 Yes No Overall, did you understand the topics presented in today's CAST Seminar? (Day 2) 0 10 20 30 40 50 60 70 80 90 100 Yes No Overall, did you understand the topics presented in the CAST Workshop?

% of NRC Participant Responses

A-2 As shown in Figure A-2, NRC participants reported that the agency would benefit from using or requiring CAST. Participants were also asked to name specific NRC groups that would benefit, as described in Appendix B.

Figure A-2: NRC benefit from CAST 0

10 20 30 40 50 60 70 80 90 100 Yes No Would the NRC benefit from performing or requiring CAST?

% of NRC Participant Responses

A-3 NRC participants indicated that CAST would enable the NRC to do its job more effectively, as shown in Figure A-3.

Figure A-3: Evaluation of CAST effectiveness in achieving NRC objectives 0

10 20 30 40 50 60 70 80 90 100 Yes No Would CAST enable the NRC to do its job more effectively?

% of NRC Participant Responses

A-4 As shown in Figure A-4, NRC participants indicated that CAST would help address real challenges in the nuclear industry.

Figure A-4: Evaluation of CAST alignment with real challenges in nuclear industry 0

10 20 30 40 50 60 70 80 90 100 Yes No Would CAST help address real challenges in your industry?

% of NRC Participant Responses

B-1 Appendix B:

NRC Participant Responses to Open-Ended Survey Questions What groups at the U.S. Nuclear Regulatory Commission (NRC) would benefit from Causal Analysis Based on System Theory (CAST)?

NRC participants verbatim answers3:

Resident inspectors are much closer to events and may be able to apply or encourage application of these concepts.

Staff in human factors engineering Resident and regional inspectors Staff conducting operating experience review Management More people from the licensing offices I think that there should be staff in the Nuclear Materials Safety and Safeguards office both for fuel processing and nuclear medicine/byproduct materialswho could benefit.

Division of Systems Analysis (DSS) and Division of Risk Analysis (DRA) personnel Much higher-level management (Office of the Executive Director for Operations (EDO) and Commissioner Technical Assistants (TAs))

Inspectors Probabilistic Risk Assessment (PRA) [Staff]

I think exposure to CAST would be useful to all of NRC. Reviewing or applying an event using CAST would require assessment and interaction among multiple groups.

Senior level management and rulemaking groups, who ultimately may need to be involved with and buy into the conclusion that the safety increase of applying CAST is cost-justified compared to existing licensee practice.

Whichever NRC staff might initiate guidance to industry on acceptable methods to perform and document accident contributors as part of continuous learning.

Top candidates: Inspectors; staff in the Operating Experience Branch.

Staff in the Office of Nuclear Regulatory Research (RES) Division of Risk Analysis (RES/DRA) responsible for human factors engineering and operating experience research and Division of Engineering (RES/DE) responsible for Research and 3 Acronyms have been spelled out. Minor clarifications for readability added in square brackets.

B-2 Development (R&D) in Digital Instrumentation and Controls (DI&C) and Electrical Engineering (EE).

TTC runs the inspector training including root cause evaluation. Safety culture/human factors groups would also benefit (RES, Operating Experience (OE), and Office of Nuclear Reactor Regulation (NRR)).

I would argue that at the NRC, most everyone would benefit. Even for people who are not practitioners in this area, they can learn a great way of looking at things. A very valuable awakening of their perspective.

I think NRC management should be engaged. It seems we are more reliant on licensees programmatic and PRA approach to discuss plant safety rather [than] weaknesses in plant designsthe DC loss of engine mentality is parallel to NPP culture.

Office of nuclear Material Safety and Safeguards (NMSS)fuel processing facilities, industrial radiographers, etc. and radiation therapy (i.e., nuclear medicine)

Inspectors, technical reviewers, management Cybersecurity Inspectors. The technique is flexible and it seems like it could be applied to almost any problem.

Operating Experience (OpE). The current practice of the NRC with respect to operational experience is that the NRC helps/facilitates industry learning from each others mishaps.

The NRC could look at certain types of OpE (e.g., no random hardware failures) to see if there are any systematic contributors, CAST would be one way of doing that. Basically, nothing is perfect, and everything could be improved. Why not look to see if we can do better?

Operating Experience review staff Researchers New/Advanced and operating reactor licensing staff who may support special inspections Regional Inspectors Those in the Regions/Headquarters (HQ) who establish policies and procedures for regional inspectors to follow Vendor Inspectors Those in the Regions/HQ who establish policies and procedures for vendor inspectors to follow Staff that support the Allegations process Those in the Regions/HQ who establish policies and procedures to investigate allegations

B-3 Staff in the Operating Experience section in NRR Resident inspectors Licensees People that work directly with the licensees Processes for which improvement potential is recognized (either by the stakeholders or by the staff or by the leadership)

What other industry groups would benefit from CAST?

NRC participants verbatim answers:

INPO (Institute of Nuclear Power Operations) whose mission is to promote the highest levels of safety and reliabilityto promote excellencein the operation of commercial nuclear power plants.

Department of Energy (DOE) labs (like Idaho National Laboratory (INL)) with interest in applying (1) new techniques to equipment condition monitoring to eliminate contributors to event reporting (e.g., component behaviors/failures that cause events or cause a limiting condition of operation like minimum equipment inventory to be unmet); and (2) greater plant automation to reduce operating costs. DOE lab efforts, in part, seem to have a potential side-effect (or objective) to alter/reduce the traditional role of INPO.

Licensees as a replacement for typical root cause analysis reports.

Obviously licensees. But also licensee contractorsespecially Architect-engineers and vendors of control system equipment.

Where might CAST be used at the NRC?

NRC participants verbatim answers:

Internal discovery of issues and resolution of business issues that could be improved. If licensees were required to use or to provide analysis of operating experience using CAST, then the NRC would have to review the reports generated using CAST.

By augmented inspection teams that are deployed to sites in response to significant licensee event reports.

By staff that review operating experience and regional staff who together discuss initial licensee event reports (morning calls) to frame follow-up activities with the licensees.

By generic communication branch staff in the creation of information notices (and other documents) that describe operational experience of general interest to sets of licensees.

Review of LERs and Radiologically Controlled Areas (RCAs) and formulating questions for additional specific information.

B-4 Possibly with resident inspectors or those sent on special inspections who are closer to event analysis.

Event analysis. It would be used to provide context for future design and inspection guidance development.

Accident analysis and prediction. Really good way to truly understand what has happened and why versus just who is at fault.

Obviously useful for event analysis, maybe also useful for evaluation of some of our own processes and procedures (even when there is not obvious failure).

Incident investigation Developing questions (e.g., Requests for Additional Information) for licensing requests Licensee Event Report (LER), but should review past, multiple LERs to see where the chips fall Analysis of OE At the NRC, it might be used in policy so that we can have Nuclear Power Plants (NPPs) adhere to it.

All review activities Analysis of operating experience events. Updating of what could/should be included in LERs for better insight into how to prevent reportable incidents.

Help identify structural or contextual factors to inefficiencies or ineffectivenesses.

More detailed investigations, such as those performed by special inspection teams in response to safety significant events, in order to derive deeper insights and create subsequent communications and identify potential improvements to the regulatory infrastructure (regulations, engagement of industry standards, guidance to NRC staff, and guidance to industry)

Learning from operating experience We can use it to help develop our guidance. Our guidance deals with making sure plants and other uses of nuclear for civilians are safe and adequately protected. Using CAST would help us anticipate a lot of scenarios that we should protect against.

Significant event reviewevents where NRC is not able to just depend on the licensees analysis Analysis of its regulatory oversight processes Identifying how the regulatory infrastructure could be modernized What prior knowledge should a participant have, if any, in order to learn successfully from these CAST seminars and workshops?

B-5 NRC participants verbatim answers:

Basic understanding of the processes/systems to be discussed. You shouldnt be struggling with how a plant works (acronyms, concepts, etc.) when taking this.

Some knowledge of the industry and accident being evaluated (very high-level background)

Some idea of alternate analysis techniques Skill to reason about causes Ability to extract relevant information from an LER and abstract the information needed to construct a control model Understanding of engineered systems and the engineering intent to make their behavior analyzable for causal factors Just a general knowledge of industry in which the incident occurred. More knowledge of alternate analysis techniques.

I dont think that the trip ups are with the CAST process. I think its more in having experience with analysis of the systems and organizations that are being analyzed.

Having more familiarity would make applying and obtaining insights from the process easier.

They need to know [how] the machinery in the plants work.

Would you like to participate in a follow-on pilot to actively apply CAST to an NRC activity? If so, what activity?

NRC participants verbatim answers:

Yes. We probably need more LERs on which to practice CAST on. But maybe LERs arent the best tool to really flesh everything out. The hard part seems to be getting all the required information in order to fill in knowledge gaps about what happened, why it happened, and how it can be prevented in the future.

I think we also need a blank slate example of how the CAST method is comparable or superior to current methods.

Yes. Either (1) some special inspection activity of a licensee that resulted from a TBD future safety significant event; or (2) some focused vendor inspection activity that resulted from either a TBD allegation or TBD Part 21 report.

Yes. Any.

Yes. Activity: Compliance-oriented processes.

Yesevent review or inspection/oversight assessment.