Research on Reactor Information Extraction Method Based on ROERE Model
-
摘要: 反应堆设计领域的文本中存在着大量价值信息需要被挖掘,而非结构化的存储形式给信息提取工作造成了极大的困难。传统基于人工规则的信息抽取方法难以在复杂数据的处理上产生效率,需要采用人工智能的技术方法来克服这些问题。本文针对反应堆主设备文本数据,分析了数据特征并提出信息抽取面临的单实体重叠问题,基于CasRel模型增加了关系信息和关系导向模块,得到改进后的ROERE模型,通过对不同模型的实验验证,表明在模型中融合关系信息和关系导向模块的改进策略是有效的,能够更准确且全面地识别和预测三元组,从而提高反应堆主设备信息抽取的准确率和召回率。Abstract: The texts of reactor design field contain a wealth of valuable information that needs to be mined, yet the unstructured form of storage poses great challenges for information extraction. Traditional information extraction methods based on artificial rules are difficult to produce efficiency in the processing of complex data, and artificial intelligence technology is needed to overcome these problems. This paper focuses on the text data of main reactor equipment, analyzes its data characteristics, and addresses the issue of single entity overlap encountered in information extraction. By incorporating the CasRel model with added relationship information and a relation-oriented module, the improved ROERE model is developed. Experimental validation across different models demonstrates that integrating relationship information and relation-oriented modules is an effective strategy, enabling more accurate and comprehensive identification and prediction of triples, thereby enhancing the accuracy and recall of information extraction for main reactor equipment.
-
Key words:
- ROERE model /
- Information extraction /
- Reactor design texts
-
表 1 实验环境参数
Table 1. Experimental Environment Parameters
实验环境 配置 操作系统 Linux 内存 24 G 编程环境 Python 3.8 GPU RTX 6000 深度学习框架 Pytorch 1.8.1 表 2 模型超参设置
Table 2. Model Hyperparameter Settings
参数名 参数值 最大序列长度 300 批量大小 1 学习率 10−5 随机丢弃率 0.3 训练轮数 25 BERT编码维数 768 表 3 不同模型的抽取性能实验结果
Table 3. Experiment Results of Extraction Performance for Different Models
模型 准确率(P)/% 召回率(R)/% F1/% CasRel 64.23 55.70 59.66 CasRel-1 82.14 58.23 68.15 ROERE 80.71 71.52 75.84 -
[1] 邓依依,邬昌兴,魏永丰,等. 基于深度学习的命名实体识别综述[J]. 中文信息学报,2021, 35(9): 30-45. doi: 10.3969/j.issn.1003-0077.2021.09.003 [2] WEI Z P, SU J L, WANG Y, et al. A novel cascade binary tagging framework for relational triple extraction[C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL '20). Stroudsburg: ACL, 2020: 1476-1488. [3] ZENG X R, ZENG D J, HE S Z, et al. Extracting relational facts by an end-to-end neural model with copy mechanism[C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne: ACL, 2018: 506-514. [4] 鄂海红,张文静,肖思琪,等. 深度学习实体关系抽取研究综述[J]. 软件学报,2019, 30(6): 1793-1818. [5] LI Q, JI H. Incremental joint extraction of entity mentions and relations[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore: ACL, 2014: 402-412. [6] FU T J, LI P H, MA W Y. GraphRel: modeling text as relational graphs for joint entity and relation extraction[C]//Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence: ACL, 2019: 1409-1418.