基于CUDA技术的先进组件中子学程序异构并行研究

郑勇; 芦韡; 马永强; 崔显涛; 郭凤晨; 马党伟; 涂晓兰

doi:10.13832/j.jnpe.2021.S2.0124

基于CUDA技术的先进组件中子学程序异构并行研究

doi: 10.13832/j.jnpe.2021.S2.0124

中国核动力研究设计院核反应堆系统设计技术重点实验室，成都，610213

详细信息

作者简介:
郑　勇（1989—），男，工程师，现主要从事堆芯物理程序研发工作，E-mail: zhengyong@hrbeu.edu.cn

中图分类号: TL329.2
计量
- 文章访问数: 219
- HTML全文浏览量: 77
- PDF下载量: 25
- 被引次数: 0
出版历程
- 收稿日期: 2021-07-19
- 录用日期: 2021-12-06
- 修回日期: 2021-11-08
- 刊出日期: 2021-12-29

Study on CUDA-based Heterogeneous Parallel for Advanced Assembly Neutronics Program

Science and Technology on Reactor System Design Technology Laboratory, Nuclear Power Institute of China, Chengdu, 610213, China

摘要

摘要: 为了提升先进组件中子学程序KYLIN-II处理复杂边界条件问题的计算性能，基于可编程显卡异构并行技术对KYLIN-II程序开展了异构并行化研究，实现了共振、输运等模块的海量线程并行计算，并通过优化迭代策略减少了异构并行程序的原子操作次数。为验证异构并行程序的计算精度和加速效果，针对AFA3G超级组件、六角形板型燃料组件、多层套管型燃料栅元等测试例题开展计算，计算结果表明，异构并行程序不会影响计算结果精度，单张显卡异构并行后的KYLIN-II程序可以达到10倍以上的加速比，优化迭代流程可以有效减少计算耗时。相对于传统的基于中央处理器（CPU）的多核并行机制，显卡异构并行显著降低了KYLIN-II程序大规模并行需要的经济成本，可以作为KYLIN-II程序开展进一步并行优化的方向。
- KYLIN-II /
- 并行技术 /
- 异构并行 /
- 原子操作
Abstract: To improve the calculation performance of the advanced assembly neutronics program KYLIN-II when handling the complicated boundary condition, the current paper studied the heterogeneous parallel in the KYLIN-II program based on heterogeneous parallel technology of programmable graphics card, implemented massive thread parallel computing of resonance and transport modules, and reduced the number of atomic operations of heterogeneous parallel program by optimizing iterative strategies. In order to verify the calculation accuracy and acceleration effect of heterogeneous parallel programs, calculations were carried out for test examples such as AFA3G super assembly, hexagonal plate fuel assembly and multilayer sleeve fuel cell. The results indicate that heterogeneous parallel programs will not affect the accuracy of calculation results. The KYLIN-II program after heterogeneous parallel of a single graphics card can achieve an acceleration ratio of more than 10 times. Optimizing the iterative process can effectively reduce the calculation time. Compared with the traditional multi-core parallel mechanism based on central processing unit (CPU), heterogeneous parallel of graphics card significantly reduces the economic cost of large-scale parallel of KYLIN-II program, which can be used as the direction of further parallel optimization of KYLIN-II program.
- KYLIN-II /
- Parallel technology /
- Heterogeneous parallel /
- Atomic operation

HTML全文

图 1 扫描过程的2种变量迭代策略

Figure 1. Two Variable Iteration Strategies for Scanning Process　　　　

下载: 全尺寸图片幻灯片

图 2 AFA3G超级组件几何网格划分

Figure 2. Geometric Mesh Partition for AFA3G Super Assembly　　　　　　

下载: 全尺寸图片幻灯片

图 3 AFA3G超级组件不同方案加速效果比较

Figure 3. Comparison of Acceleration Effects of Different Schemes of AFA3G Super Assembly

下载: 全尺寸图片幻灯片

图 4 六角形板型燃料组件几何网格划分

Figure 4. Geometric Mesh Partition for Hexagonal Plate Fuel Assembly

下载: 全尺寸图片幻灯片

图 5 燃料板材料布置

Figure 5. Material Configuration of Fuel Plate

下载: 全尺寸图片幻灯片

图 6 六角形板型燃料组件不同方案加速效果比较

Figure 6. Comparison of Acceleration Effects of Different Schemes of Hexagonal Plate Fuel Assembly

下载: 全尺寸图片幻灯片

图 7 多层套管型燃料栅元几何结构及材料布置

Figure 7. Geometric Structure and Material Layout of Multilayer Sleeve Fuel Cell

下载: 全尺寸图片幻灯片

图 8 套管型燃料不同方案加速效果比较

Figure 8. Comparison of Acceleration Effects of Different Schemes of Sleeve Fuel

下载: 全尺寸图片幻灯片

表 1 异构平台硬件参数规格

Table 1. Hardware Parameter Specifications for Heterogeneous Platform

硬件	参数规格
CPU	Intel Haswell E5 CPU，主频2.6 GHz
Quadro K6000	计算能力为3.5，主频为902 MHz，显存频率为3004 MHz，2880个CUDA核心
Tesla K20c	计算能力为3.5，主频为706 MHz，显存频率为2600 MHz，2496个CUDA核心

下载: 导出CSV

表 2 AFA3G超级组件数值计算结果比较

Table 2. Comparison of Numerical Results of AFA3G Super Assembly

硬件资源	是否优化	浮点数精度	k_eff	共振模块耗时/s	输运模块耗时/s	每群射线扫描耗时/ms
CPU串行	否	双精度	1.047382	1459.4（1454.2）	2721.0（2706.2）^①	4622.7
Quadro K6000	否	单精度	1.047382	131.3（124.8）	251.0（234.8）	399.6
	否	双精度	1.047382	184.7（177.9）	355.1（336.8）	571.9
	是	单精度	1.047382	111.0（107.6）	212.2（199.8）	341.6
	是	双精度	1.047382	132.5（129.0）	254.6（242.1）	412.3
Tesla K20c	否	单精度	1.047382	209.6（204.5）	393.8（379.0）	648.2
	否	双精度	1.047382	253.4（248.0）	478.1（463.7）	790.3
	是	单精度	1.047382	176.7（173.9）	332.8（322.7）	551.6
	是	双精度	1.047382	195.4（192.7）	367.2（357.2）	610.4
注：①括号内时间表示该模块的射线扫描时间和CPU/GPU数据拷贝时间之和，不含粗网加速等过程的耗时

下载: 导出CSV

表 3 六角形板型组件数值计算结果比较

Table 3. Comparison of Numerical Results of Hexagonal Plate Fuel Assembly

硬件资源	是否优化	浮点数精度	k_eff	共振模块耗时/s	输运模块耗时/s	每群射线扫描耗时/ms
CPU串行	否	双精度	1.630950	69.9（69.1）	24442.4（24323.9）^②	3890.4
Quadro K6000	否	单精度	1.630950	7.6（6.7）	2259.7（2093.3）	334.9
	否	双精度	1.630950	8.5（7.6）	2544.3（2375.3）	380.1
	是	单精度	1.630950	6.8（6.1）	2017.4（1894.7）	303.2
	是	双精度	1.630950	7.5（6.7）	2165.1（2044.2）	327.1
Tesla K20c	否	单精度	1.630950	13.2（12.3）	3531.8（3406.7）	545.3
	否	双精度	1.630950	13.7（12.8）	3638.3（3513.4）	562.4
	是	单精度	1.630950	11.9（11.3）	3150.9（3060.4）	489.9
	是	双精度	1.630950	12.5（11.9）	3276.0（3185.4）	510.0
注：②括号内时间表示该模块的射线扫描时间和CPU/GPU数据拷贝时间之和，不含粗网加速等过程的耗时

下载: 导出CSV

表 4 多层套管型燃料栅元数值计算结果比较

Table 4. Comparison of Numerical Results of Multilayer Sleeve Fuel Cell

硬件资源	是否优化	k_eff	共振模块耗时/s	输运模块耗时/s	每群射线扫描耗时/ms
CPU串行	否	1.518689	13.4（13.3）	3178.2（3177.9）^③	699.8
Quadro K6000	否	1.518689	1.6（1.6）	274.4（274.1）	60.5
Quadro K6000	是	1.518689	1.6（1.5）	262.1（261.8）	57.7
Tesla K20c	否	1.518689	4.1（4.0）	433.7（433.5）	95.9
Tesla K20c	是	1.518689	4.3（4.0）	417.6（417.5）	92.4
注：③括号内时间表示该模块的射线扫描时间和CPU/GPU数据拷贝时间之和，不含粗网加速等过程的耗时

下载: 导出CSV

参考文献(7)

[1]	涂晓兰,柴晓明,刘东,等. 先进中子学栅格程序KYLIN-II输运模块并行优化开发[J]. 原子能科学技术,2020, 54(5): 930-936. doi: 10.7538/yzk.2019.youxian.0276
[2]	BOYD W, SHANER S, LI L L, et al. The OpenMOC method of characteristics neutral particle transport code[J]. Annals of Nuclear Energy, 2014, 68: 43-52. doi: 10.1016/j.anucene.2013.12.012
[3]	宋佩涛,张志俭,张乾,等. CPU-GPU协同计算在MOC中子输运异构并行计算中的应用研究[J]. 核动力工程,2020, 41(4): 17-21.
[4]	宋佩涛,张志俭,梁亮,等. GPU加速MOC输运计算性能分析研究[J]. 原子能科学技术,2020, 54(1): 103-111. doi: 10.7538/yzk.2019.youxian.0094
[5]	郑勇. 矩阵特征线方法加速技术及三维中子输运计算方法研究[D]. 哈尔滨: 哈尔滨工程大学, 2017.
[6]	IAEA. Research reactor core conversion from the use of highly enriched uranium fuels: guidebook[R]. IAEA-TECDOC-233, Vienna: IAEA, 1980
[7]	黄世恩. 处理多重复杂度中子共振问题的子群方法研究[D]. 北京: 清华大学, 2011.