Research on Intelligent Control Method of Operating Temperature of Reactor Thermal System Based on Deep Reinforcement Learning
-
摘要: 传统比例-积分-微分(PID)控制方法难以实现良好稳定的控制效果。本文提出了基于深度强化学习的反应堆热工系统运行温度智能控制方法,步骤为:①搭建反应堆热工系统RELAP5模型,并对其进行交互扩展,使其能够支持深度强化学习技术;②在柔性动作-评价(SAC)算法的基础上耦合了多变量长短期记忆(LSTM)神经网络,有效提取了控制历史信息的时序特征;③基于优化目标驱动的控制模型可自行收集数据样本,并通过自我学习机制,完成控制策略的优化;④根据多变量状态特征和时序特征,实现了对运行温度的端到端控制。通过与PID控制器的仿真实验对比验证,本文提出的方法具有优异的负荷跟踪能力与扰动抑制能力,具备良好的环境适应性与控制鲁棒性。
-
关键词:
- 反应堆热工系统 /
- 深度强化学习 /
- 柔性动作-评价(SAC) /
- 长短期记忆(LSTM) /
- 智能控制
Abstract: Traditional proportional-integral-differential (PID) control method is difficult to achieve good and stable control effect. In this paper, an intelligent control method of operating temperature of reactor thermal system based on deep reinforcement learning is proposed. The steps are as follows: RELAP5 model of reactor thermal system is built and extended interactively, so that it can support deep reinforcement learning technology. Secondly, based on the Soft Actor-Critic (SAC) algorithm and coupled with the multivariable Long Short-Term Memory (LSTM) neural network, the time series characteristics of the control history information are effectively extracted. Then, the control model driven by optimization goal can collect data samples by itself, and complete the optimization of control strategy through self-learning mechanism. According to the multivariable state characteristics and time series characteristics, the end-to-end control of operating temperature is realized. Compared with the simulation experiment of PID controller, the proposed method has excellent load tracking ability and disturbance suppression ability, and has good environmental adaptability and control robustness. -
表 1 仿真模型参数配置
Table 1. Parameter Configuration of Simulation Model
状态参数及单位 初始状态 参数边界 质量流量/(kg·s−1) 0.2086 [0,0.3] 压力/MPa 0.3 [0.2,0.6] 循环泵角速度/(rad·s−1) 150 [0,250] 加热器入口温度/K 337.65 [293.15,400.00] 加热器出口温度/K 354.80 [293.15,400.00] 加热器功率/kW 15 [0,30] 预热器功率/kW 12 [0,30] 表 2 控制性能评价
Table 2. Control Performance Evaluation
指标 PID SAC耦合
多变量LSTM优化率/% 上升时间 8 step 8 step 0 最大偏差 1.05 K 0.38 K +63.81 超调量 0.292% 0.105% +64.04 稳态RMSE 0.489 K 0.106 K +78.32 稳态相对误差 0.136% 0.029% +78.68 step—交互时间步长;RMSE—均方根差 表 3 目标追踪实验控制性能评价
Table 3. Control Performance Evaluation of Target Tracking Experiment
性能指标 目标变换追踪 目标连续变换追踪 目标1变至目标2 目标2变至目标1 PID SAC耦合
多变量LSTM优化率/% PID SAC耦合
多变量LSTM优化率/% PID SAC耦合
多变量LSTM优化率/% 上升时间 20 step 17 step +15.00 18 step 13 step +27.78 5 step 6 step −20.00 最大偏差 2.04 K 0.56 K +72.55 1.67 K 0.43 K +74.25 1.91 K 0.25 K +72.55 超调量 0.559% 0.153% +72.63 0.457% 0.117% +74.40 0.531% 0.069% +86.91 稳态RMSE 0.858 K 0.312 K +63.64 0.965 K 0.271 K +71.92 1.251 K 0.109 K +91.29 稳态相对误差 0.235% 0.085% +63.83 0.264% 0.074% +71.97 0.347% 0.030% +91.35 表 4 扰动抑制实验控制性能评价
Table 4. Control Performance Evaluation of Disturbance Suppression Experiment
性能指标 功率扰动 流量扰动 PID SAC耦合多变量LSTM 优化率/% PID SAC耦合多变量LSTM 优化率/% 回调时间 5 step 6 step −20.00 7 step 6 step +14.29 最大偏差 1.54 K 0.20 K +87.01 1.57 K 0.67 K +57.32 超调量 0.428% 0.056% +86.92 0.436% 0.156% +64.22 稳态RMSE 0.607 K 0.075 K +87.64 0.740 K 0.210 K +71.62 稳态相对误差 0.139% 0.021% +84.89 0.205% 0.058% +71.71 -
[1] DONG Z, CHENG Z H, ZHU Y L, et al. Review on the recent progress in nuclear plant dynamical modeling and control[J]. Energies, 2023, 16(3): 1443. doi: 10.3390/en16031443 [2] 孙奥迪,孙培伟,魏新宇. 小型铅铋冷却快堆堆芯功率控制研究[J]. 核动力工程,2022, 43(6): 155-161. [3] DEGRAVE J, FELICI F, BUCHLI J, et al. Magnetic control of tokamak plasmas through deep reinforcement learning[J]. Nature, 2022, 602(7897): 414-419. doi: 10.1038/s41586-021-04301-9 [4] PARK J K, KIM T K, SEONG S H. Providing support to operators for monitoring safety functions using reinforcement learning[J]. Progress in Nuclear Energy, 2020, 118: 103123. doi: 10.1016/j.pnucene.2019.103123 [5] SAEED H A, PENG M J, WANG H, et al. Autonomous control model for emergency operation of small modular reactor[J]. Annals of Nuclear Energy, 2023, 190: 109874. doi: 10.1016/j.anucene.2023.109874 [6] 刘永超,李桐,成以恒,等. 基于深度确定性策略梯度算法的自适应核反应堆功率控制器设计[J]. 原子能科学技术,2024, 58(5): 1076-1083. [7] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning. Stockholm: PMLR, 2018: 1861-1870. [8] 朱少民,夏虹,吕新知,等. 基于ARIMA和LSTM组合模型的核电厂主泵状态预测[J]. 核动力工程,2022, 43(2): 246-253. doi: 10.13832/j.jnpe.2022.02.0246. [9] 张思原,卢忝余,曾辉,等. 基于LSTM的核电传感器多特征融合多步状态预测[J]. 核动力工程,2021, 42(4): 208-213. doi: 10.13832/j.jnpe.2021.04.0208. [10] 冀南,杨俊康,赵鹏程,等. 耦合多变量LSTM与优化算法的铅铋反应堆事故参数预测方法研究[J]. 核动力工程,2023, 44(5): 64-70. doi: 10.13832/j.jnpe.2023.05.0064.