基于深度强化学习的反应堆热工系统运行温度智能控制方法研究

刘永超; 谭思超; 李桐; 程家豪; 王博; 高璞珍; 田瑞峰

doi:10.13832/j.jnpe.2024.S2.0197

基于深度强化学习的反应堆热工系统运行温度智能控制方法研究

doi: 10.13832/j.jnpe.2024.S2.0197

刘永超^{1, 2,},
谭思超^{1, 2, ,},
李桐^{1, 2},
程家豪^{1, 2},
王博^{1, 2},
高璞珍^{1, 2},
田瑞峰^{1, 2}

1.
哈尔滨工程大学黑龙江省核动力装置性能与设备重点实验室，哈尔滨，150001
2.
哈尔滨工程大学核安全与先进核能技术工信部重点实验室，哈尔滨，150001

基金项目: 中核集团领创项目（CNNC-LCKY-202251）

详细信息

作者简介:
刘永超（1998—），男，博士研究生，现主要从事核反应堆热工水力方面的研究，E-mail: heuliuyongchao@hrbeu.edu.cn

通讯作者:
谭思超，E-mail: tansichao@hrbeu.edu.cn

中图分类号: TL361
计量
- 文章访问数: 114
- HTML全文浏览量: 46
- PDF下载量: 20
- 被引次数: 0
出版历程
- 收稿日期: 2024-06-21
- 修回日期: 2024-09-11
- 刊出日期: 2025-01-06

Research on Intelligent Control Method of Operating Temperature of Reactor Thermal System Based on Deep Reinforcement Learning

Liu Yongchao^{1, 2
,},
Tan Sichao^{1, 2
, ,},
Li Tong^{1, 2},
Cheng Jiahao^{1, 2},
Wang Bo^{1, 2},
Gao Puzhen^{1, 2},
Tian Ruifeng^{1, 2}

1.
Heilongjiang Provincial Key Laboratory of Nuclear Power System & Equipment, Harbin Engineering University, Harbin, 150001, China
2.
Key Laboratory of Nuclear Safety and Advanced Nuclear Energy Technology, Ministry of Industry and InformationTechnology, Harbin Engineering University, Harbin, 150001, China

摘要

摘要: 传统比例-积分-微分（PID）控制方法难以实现良好稳定的控制效果。本文提出了基于深度强化学习的反应堆热工系统运行温度智能控制方法，步骤为：①搭建反应堆热工系统RELAP5模型，并对其进行交互扩展，使其能够支持深度强化学习技术；②在柔性动作-评价（SAC）算法的基础上耦合了多变量长短期记忆（LSTM）神经网络，有效提取了控制历史信息的时序特征；③基于优化目标驱动的控制模型可自行收集数据样本，并通过自我学习机制，完成控制策略的优化；④根据多变量状态特征和时序特征，实现了对运行温度的端到端控制。通过与PID控制器的仿真实验对比验证，本文提出的方法具有优异的负荷跟踪能力与扰动抑制能力，具备良好的环境适应性与控制鲁棒性。
- 反应堆热工系统 /
- 深度强化学习 /
- 柔性动作-评价（SAC） /
- 长短期记忆（LSTM） /
- 智能控制
Abstract: Traditional proportional-integral-differential (PID) control method is difficult to achieve good and stable control effect. In this paper, an intelligent control method of operating temperature of reactor thermal system based on deep reinforcement learning is proposed. The steps are as follows: RELAP5 model of reactor thermal system is built and extended interactively, so that it can support deep reinforcement learning technology. Secondly, based on the Soft Actor-Critic (SAC) algorithm and coupled with the multivariable Long Short-Term Memory (LSTM) neural network, the time series characteristics of the control history information are effectively extracted. Then, the control model driven by optimization goal can collect data samples by itself, and complete the optimization of control strategy through self-learning mechanism. According to the multivariable state characteristics and time series characteristics, the end-to-end control of operating temperature is realized. Compared with the simulation experiment of PID controller, the proposed method has excellent load tracking ability and disturbance suppression ability, and has good environmental adaptability and control robustness.
- Reactor thermal system /
- Deep reinforcement learning /
- Soft Actor-Critic (SAC) /
- Long Short-Term Memory (LSTM) /
- Intelligent control

HTML全文

图 1 反应堆热工水力实验系统

Figure 1. Thermal Hydraulic Experimental System of Reactor

下载: 全尺寸图片幻灯片

图 2 强化学习框架

Figure 2. Reinforcement Learning Framework

下载: 全尺寸图片幻灯片

图 3 LSTM计算结构

W—不同门结构的权值矩阵；$ {{{x}}^{(t)}} $—当前时刻的输入；$ \left[ {{{{h}}^{(t - 1)}},{{{x}}^{(t)}}} \right] $—合并历史信息后的输入；$ \varphi $—sigmoid型激活函数；tanh—tanh型激活函数；$ \odot $—矩阵叉乘；$ {\widetilde {{c}}^{(t)}} $—记忆候选值向量

Figure 3. LSTM Computing Structure

下载: 全尺寸图片幻灯片

图 4 SAC算法耦合多变量LSTM网络的控制算法框架

Figure 4. Control Algorithm Framework of SAC Algorithm Coupled with Multivariable LSTM Network

下载: 全尺寸图片幻灯片

图 5 深度强化学习交互环境架构

Figure 5. Deep Reinforcement Learning Interactive Environment Architecture

下载: 全尺寸图片幻灯片

图 6 控制算法自学习奖励曲线

Figure 6. Self-learning Reward Curve of Control Algorithm

下载: 全尺寸图片幻灯片

图 7 SAC耦合多变量LSTM与PID控制效果对比

Figure 7. Comparison of Control Effects Between SAC Coupling Multivariable LSTM and PID

下载: 全尺寸图片幻灯片

图 8 控制目标为365 K时控制效果对比

Figure 8. Comparison of Control Effects with a Control Target of 365 K

下载: 全尺寸图片幻灯片

图 9 目标连续变换追踪

Figure 9. Target Continuous Transformation Tracking

下载: 全尺寸图片幻灯片

图 10 功率扰动抑制

Figure 10. Power Disturbance Suppression

下载: 全尺寸图片幻灯片

图 11 流量扰动抑制

Figure 11. Flow Disturbance Suppression

下载: 全尺寸图片幻灯片

表 1 仿真模型参数配置

Table 1. Parameter Configuration of Simulation Model

状态参数及单位	初始状态	参数边界
质量流量/(kg·s⁻¹)	0.2086	[0,0.3]
压力/MPa	0.3	[0.2,0.6]
循环泵角速度/(rad·s⁻¹)	150	[0,250]
加热器入口温度/K	337.65	[293.15,400.00]
加热器出口温度/K	354.80	[293.15,400.00]
加热器功率/kW	15	[0,30]
预热器功率/kW	12	[0,30]

下载: 导出CSV

表 2 控制性能评价

Table 2. Control Performance Evaluation

指标	PID	SAC耦合多变量LSTM	优化率/%
上升时间	8 step	8 step	0
最大偏差	1.05 K	0.38 K	+63.81
超调量	0.292%	0.105%	+64.04
稳态RMSE	0.489 K	0.106 K	+78.32
稳态相对误差	0.136%	0.029%	+78.68
step—交互时间步长；RMSE—均方根差

下载: 导出CSV

表 3 目标追踪实验控制性能评价

Table 3. Control Performance Evaluation of Target Tracking Experiment

性能指标	目标变换追踪			目标连续变换追踪
	目标变换追踪			目标1变至目标2			目标2变至目标1
	PID	SAC耦合多变量LSTM	优化率/%	PID	SAC耦合多变量LSTM	优化率/%	PID	SAC耦合多变量LSTM	优化率/%
上升时间	20 step	17 step	+15.00	18 step	13 step	+27.78	5 step	6 step	−20.00
最大偏差	2.04 K	0.56 K	+72.55	1.67 K	0.43 K	+74.25	1.91 K	0.25 K	+72.55
超调量	0.559%	0.153%	+72.63	0.457%	0.117%	+74.40	0.531%	0.069%	+86.91
稳态RMSE	0.858 K	0.312 K	+63.64	0.965 K	0.271 K	+71.92	1.251 K	0.109 K	+91.29
稳态相对误差	0.235%	0.085%	+63.83	0.264%	0.074%	+71.97	0.347%	0.030%	+91.35

下载: 导出CSV

表 4 扰动抑制实验控制性能评价

Table 4. Control Performance Evaluation of Disturbance Suppression Experiment

性能指标	功率扰动			流量扰动
性能指标	PID	SAC耦合多变量LSTM	优化率/%	PID	SAC耦合多变量LSTM	优化率/%
回调时间	5 step	6 step	−20.00	7 step	6 step	+14.29
最大偏差	1.54 K	0.20 K	+87.01	1.57 K	0.67 K	+57.32
超调量	0.428%	0.056%	+86.92	0.436%	0.156%	+64.22
稳态RMSE	0.607 K	0.075 K	+87.64	0.740 K	0.210 K	+71.62
稳态相对误差	0.139%	0.021%	+84.89	0.205%	0.058%	+71.71

下载: 导出CSV

参考文献(10)

[1]	DONG Z, CHENG Z H, ZHU Y L, et al. Review on the recent progress in nuclear plant dynamical modeling and control[J]. Energies, 2023, 16(3): 1443. doi: 10.3390/en16031443
[2]	孙奥迪,孙培伟,魏新宇. 小型铅铋冷却快堆堆芯功率控制研究[J]. 核动力工程,2022, 43(6): 155-161.
[3]	DEGRAVE J, FELICI F, BUCHLI J, et al. Magnetic control of tokamak plasmas through deep reinforcement learning[J]. Nature, 2022, 602(7897): 414-419. doi: 10.1038/s41586-021-04301-9
[4]	PARK J K, KIM T K, SEONG S H. Providing support to operators for monitoring safety functions using reinforcement learning[J]. Progress in Nuclear Energy, 2020, 118: 103123. doi: 10.1016/j.pnucene.2019.103123
[5]	SAEED H A, PENG M J, WANG H, et al. Autonomous control model for emergency operation of small modular reactor[J]. Annals of Nuclear Energy, 2023, 190: 109874. doi: 10.1016/j.anucene.2023.109874
[6]	刘永超,李桐,成以恒,等. 基于深度确定性策略梯度算法的自适应核反应堆功率控制器设计[J]. 原子能科学技术,2024, 58(5): 1076-1083.
[7]	HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 1861-1870.
[8]	朱少民,夏虹,吕新知,等. 基于ARIMA和LSTM组合模型的核电厂主泵状态预测[J]. 核动力工程,2022, 43(2): 246-253. doi: 10.13832/j.jnpe.2022.02.0246.
[9]	张思原,卢忝余,曾辉,等. 基于LSTM的核电传感器多特征融合多步状态预测[J]. 核动力工程,2021, 42(4): 208-213. doi: 10.13832/j.jnpe.2021.04.0208.
[10]	冀南,杨俊康,赵鹏程,等. 耦合多变量LSTM与优化算法的铅铋反应堆事故参数预测方法研究[J]. 核动力工程,2023, 44(5): 64-70. doi: 10.13832/j.jnpe.2023.05.0064.