基于SAC强化学习的核电事故诊断规程优化

张大志; 王志会; 周华兵; 付永杰; 习家轩

doi:10.13832/j.jnpe.2024.S1.0085

基于SAC强化学习的核电事故诊断规程优化

doi: 10.13832/j.jnpe.2024.S1.0085

张大志^1,,
王志会¹,
周华兵^{2, 3, ,},
付永杰^{2, 3},
习家轩^{2, 3}

1.
中核武汉核电运行技术股份有限公司，中核核工业仿真技术重点实验室，武汉，430040
2.
武汉工程大学，计算机科学与工程学院，人工智能学院，武汉，430205
3.
武汉工程大学，智能机器人湖北省重点实验室，武汉，430205

详细信息

作者简介:
张大志（1977—），男，高级工程师，主要从事核动力仿真及工业人工智能研究，E-mail: zhangdz02@cnnp.com.cn

通讯作者:
周华兵，E-mail: zhouhuabing@gmail.com

中图分类号: TL334
计量
- 文章访问数: 276
- HTML全文浏览量: 82
- PDF下载量: 15
- 被引次数: 0
出版历程
- 收稿日期: 2024-01-01
- 修回日期: 2024-04-11
- 刊出日期: 2024-06-15

Optimization of Nuclear Power Accident Diagnosis Procedures Based on SAC Reinforcement Learning

Zhang Dazhi^1
,,
Wang Zhihui¹,
Zhou Huabing^{2, 3
, ,},
Fu Yongjie^{2, 3},
Xi Jiaxuan^{2, 3}

1.
CNNC Key Laboratory on Nuclear Industry Simulation, China Nuclear Power Operation Technology Corporation, Ltd., Wuhan, 430040, China
2.
College of Computer Science and Engineering, Wuhan Institute of Technology, Wuhan, 430205, China
3.
Hubei Provincial Key Laboratory of Intelligent Robot, Wuhan Institute of Technology, Wuhan, 430205, China

摘要

摘要: 基于Soft Actor-Critic （SAC）算法的核电事故诊断规程优化方法，以决策树模型为基础，对事故检测规程判断策略进行优化，在显著提高事故检测性能的同时保持了决策模型的可解释性。模型使用SAC作为强化学习算法，将状态定义为当前运行数据和历史数据的组合，动作设定为诊断规程决策阈值的调整，回报反映了诊断的准确性。借助SAC算法，系统不断地调整阈值进行策略优化以获得最佳的诊断效果。在主蒸汽管道破裂（MSLB）模拟工况事故中，模型能更好地适应和理解复杂高维数据，找到特定性能指标下的最优控制策略，准确率稳步趋近于1。本文方法显著减少了误判率，不仅更准确地检测核电事故，而且在减少误警方面表现出优秀的结果，提高了核电运行的安全性。
- 核电事故 /
- 强化学习 /
- 规程优化 /
- MSLB
Abstract: This paper proposes an optimization method for nuclear accident diagnosis procedures based on the Soft Actor-Critic (SAC) reinforcement learning model. Using a decision tree model as the foundation to optimize the judgment strategy of accident detection procedures, which significantly improves the performance of accident detection while maintaining the interpretability of the decision model. The model employs SAC as the reinforcement learning algorithm, which defines the state as a combination of current operating data and historical data, sets the actions as the adjustment of the decision threshold of diagnostic procedures, and reflects the accuracy of diagnosis through the returns. With the help of SAC algorithm, the system constantly adjusts the threshold to optimize the strategy to obtain the best diagnosis effect. In a simulated Main Steam Line Break (MSLB) accident scenario, the model can better adapt to and comprehend complex high-dimensional data, find the optimal control strategy under specific performance indicators, and the accuracy is steadily approaching 1. The proposed method significantly reduces the false positive rate, and it not only detects nuclear power accidents more accurately, but also shows excellent results in reducing false alarms, thus improving the safety of nuclear power operation.
- Nuclear power accident /
- Reinforcement learning /
- Procedure optimization /
- MSLB

HTML全文

图 1 核电事故检测决策控制逻辑图

Figure 1. Logic Diagram of Decision-making and Control for Nuclear Power Accident Detection

下载: 全尺寸图片幻灯片

图 2 核电事故检测相关决策树

P_R—一回路压力；P_i—i#蒸汽发生器压力，i=1~3；P_H—安全壳压力

Figure 2. Decision Tree Related to Nuclear Power Accident Detection

下载: 全尺寸图片幻灯片

图 3 重要参数1#蒸汽发生器压力的事故趋势图

Figure 3. Accident Trend Chart of Important Parameter 1 # SG Pressure

下载: 全尺寸图片幻灯片

图 4 准确率随训练轮数的变化趋势图

Figure 4. Trend of Accuracy with the Number of Training Rounds

下载: 全尺寸图片幻灯片

图 5 各节点阈值随训练轮数的变化趋势图

Figure 5. Trend of Threshold Changes of Each Node with the Number of Training Rounds

下载: 全尺寸图片幻灯片

参考文献(12)

[1]	许勇,蔡云泽,宋林. 基于数据驱动的核电设备状态评估研究综述[J]. 上海交通大学学报,2022, 56(3): 267-278.
[2]	齐奔,梁金刚,张立国,等. 基于贝叶斯分类器的核电厂事故诊断方法研究[J]. 原子能科学技术,2022, 56(3): 512-519. doi: 10.7538/yzk.2021.youxian.0120
[3]	蒋建军,张力,王以群,等. 基于隐马尔可夫的核电厂半数字化人-机界面事故诊断过程人因可靠性模型[J]. 核动力工程,2012, 33(5): 79-82,128. doi: 10.3969/j.issn.0258-0926.2012.05.017
[4]	李映林. 数字化核电站智能诊断系统研究[D]. 哈尔滨: 哈尔滨工程大学,2008.
[5]	张燕,周志伟,董秀臣. 核电厂实时故障诊断专家系统的设计与实现[J]. 原子能科学技术,2006, 40(4): 420-423.
[6]	LAHEY JR R T, MOODY F J. The thermal-hydraulics of a boiling water nuclear reactor[M]. Illinois: Amer Nuclear Society, 1993: 25-27.
[7]	LEE D, ARIGI A M, KIM J. Algorithm for autonomous power-increase operation using deep reinforcement learning and a rule-based system[J]. IEEE Access, 2020, 8: 196727-196746. doi: 10.1109/ACCESS.2020.3034218
[8]	FU H B, LIU W M, WU S, et al. Actor-critic policy optimization in a large-scale imperfect-information game[C]//Proceedings of the 10th International Conference on Learning Representations. OpenReview. net, 2022.
[9]	DEGRAVE J, FELICI F, BUCHLI J, et al. Magnetic control of tokamak plasmas through deep reinforcement learning[J]. Nature, 2022, 602(7897): 414-419. doi: 10.1038/s41586-021-04301-9
[10]	俞尔俊. 秦山核电厂主蒸汽管道破裂事故的分析研究[J]. 原子能科学技术,1989, 23(5): 15-22. doi: 10.7538/yzk.1989.23.05.0015
[11]	TOROMANOFF M, WIRBEL E, MOUTARDE F. End-to-end model-free reinforcement learning for urban driving using implicit affordances[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 7151-7160.
[12]	PARK J, KIM T, SEONG S, et al. Control automation in the heat-up mode of a nuclear power plant using reinforcement learning[J]. Progress in Nuclear Energy, 2022, 145: 104107. doi: 10.1016/j.pnucene.2021.104107