基于ROERE模型的反应堆信息抽取方法研究

李聪; 李思佳; 徐浩然; 颜雄

doi:10.13832/j.jnpe.2024.03.0252

基于ROERE模型的反应堆信息抽取方法研究

doi: 10.13832/j.jnpe.2024.03.0252

1.
中国核动力研究设计院核反应堆系统设计技术重点实验室，成都，610213
2.
四川大学，成都，610065

详细信息

作者简介:
李　聪（1986—），男，副研究馆员，现主要从事知识工程、知识管理和数据管理方面的研究，E-mail: 406308577@qq.com

中图分类号: TL334
计量
- 文章访问数: 176
- HTML全文浏览量: 64
- PDF下载量: 118
- 被引次数: 0
出版历程
- 收稿日期: 2024-02-04
- 修回日期: 2024-04-04
- 刊出日期: 2024-06-13

Research on Reactor Information Extraction Method Based on ROERE Model

1.
Science and Technology on Reactor System Design Technology Laboratory, Nuclear Power Institute of China, Chengdu, 610213, China
2.
Sichuan University, Chengdu, 610065, China

摘要

摘要: 反应堆设计领域的文本中存在着大量价值信息需要被挖掘，而非结构化的存储形式给信息提取工作造成了极大的困难。传统基于人工规则的信息抽取方法难以在复杂数据的处理上产生效率，需要采用人工智能的技术方法来克服这些问题。本文针对反应堆主设备文本数据，分析了数据特征并提出信息抽取面临的单实体重叠问题，基于CasRel模型增加了关系信息和关系导向模块，得到改进后的ROERE模型，通过对不同模型的实验验证，表明在模型中融合关系信息和关系导向模块的改进策略是有效的，能够更准确且全面地识别和预测三元组，从而提高反应堆主设备信息抽取的准确率和召回率。
- ROERE模型 /
- 信息抽取 /
- 反应堆设计文本
Abstract: The texts of reactor design field contain a wealth of valuable information that needs to be mined, yet the unstructured form of storage poses great challenges for information extraction. Traditional information extraction methods based on artificial rules are difficult to produce efficiency in the processing of complex data, and artificial intelligence technology is needed to overcome these problems. This paper focuses on the text data of main reactor equipment, analyzes its data characteristics, and addresses the issue of single entity overlap encountered in information extraction. By incorporating the CasRel model with added relationship information and a relation-oriented module, the improved ROERE model is developed. Experimental validation across different models demonstrates that integrating relationship information and relation-oriented modules is an effective strategy, enabling more accurate and comprehensive identification and prediction of triples, thereby enhancing the accuracy and recall of information extraction for main reactor equipment.
- ROERE model /
- Information extraction /
- Reactor design texts

HTML全文

图 1 单实体重叠现象

Figure 1. Single Entity Overlap Phenomenon

下载: 全尺寸图片幻灯片

图 2 ROERE模型架构

$ {\boldsymbol{v}}_{{\text{head}}}^i $—第$ i $个头实体特征向量；${\boldsymbol{v}}_{{\text{rel}}}^{}$—关系特征向量；$ {{\boldsymbol{H}}_{{\text{text}}}} $—文本语义特征向量；Linear—线性层；dropout—随机丢弃层；ReLU—非线性激活函数

Figure 2. ROERE Model Framework

下载: 全尺寸图片幻灯片

表 1 实验环境参数

Table 1. Experimental Environment Parameters

实验环境配置

操作系统 Linux

内存 24 G

编程环境 Python 3.8

GPU RTX 6000

深度学习框架 Pytorch 1.8.1

下载: 导出CSV

表 2 模型超参设置

Table 2. Model Hyperparameter Settings

参数名参数值

最大序列长度 300

批量大小 1

学习率 10⁻⁵

随机丢弃率 0.3

训练轮数 25

BERT编码维数 768

下载: 导出CSV

表 3 不同模型的抽取性能实验结果

Table 3. Experiment Results of Extraction Performance for Different Models

模型准确率（P）/% 召回率（R）/% F₁/%

CasRel 64.23 55.70 59.66

CasRel-1 82.14 58.23 68.15

ROERE 80.71 71.52 75.84

下载: 导出CSV

参考文献(6)

[1]	邓依依,邬昌兴,魏永丰,等. 基于深度学习的命名实体识别综述[J]. 中文信息学报,2021, 35(9): 30-45. doi: 10.3969/j.issn.1003-0077.2021.09.003
[2]	WEI Z P, SU J L, WANG Y, et al. A novel cascade binary tagging framework for relational triple extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL '20). Stroudsburg: ACL, 2020: 1476-1488.
[3]	ZENG X R, ZENG D J, HE S Z, et al. Extracting relational facts by an end-to-end neural model with copy mechanism[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne: ACL, 2018: 506-514.
[4]	鄂海红,张文静,肖思琪,等. 深度学习实体关系抽取研究综述[J]. 软件学报,2019, 30(6): 1793-1818.
[5]	LI Q, JI H. Incremental joint extraction of entity mentions and relations[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore: ACL, 2014: 402-412.
[6]	FU T J, LI P H, MA W Y. GraphRel: modeling text as relational graphs for joint entity and relation extraction[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019: 1409-1418.