基本信息:
- 专利标题: 文本内涵质量的评估方法、装置、设备及存储介质
- 专利标题(英):TEXT CONTENT QUALITY EVALUATION METHOD, APPARATUS AND DEVICE, AND STORAGE MEDIUM
- 申请号:PCT/CN2020/131673 申请日:2020-11-26
- 公开(公告)号:WO2021139424A1 公开(公告)日:2021-07-15
- 发明人: 唐蕊
- 申请人: 平安科技(深圳)有限公司
- 申请人地址: 中国广东省深圳市福田区福田街道福安社区益田路5033号平安金融中心23楼, Guangdong 518033
- 专利权人: 平安科技(深圳)有限公司
- 当前专利权人: 平安科技(深圳)有限公司
- 当前专利权人地址: 中国广东省深圳市福田区福田街道福安社区益田路5033号平安金融中心23楼, Guangdong 518033
- 代理机构: 北京市京大律师事务所
- 优先权: CN202010405915.6 2020-05-14
- 主分类号: G06F40/216
- IPC分类号: G06F40/216 ; G06F40/289 ; G06F40/253 ; G06F40/30 ; G06K9/6267 ; G06N20/00
Disclosed are a text content quality evaluation method, apparatus and device, and a storage medium, which relate to the technical field of artificial intelligence, and are used for improving the accuracy of text content quality evaluation. The method comprises: acquiring initial text from preset medical record text, wherein the initial text comprises chief complaint information, existing medical history information, physical examination information, first-time disease course record information, disease course record information, ward round record information and operation record information (101); performing text pre-processing on the initial text by means of a natural language processing algorithm to obtain target text (102); performing text coding on the target text by means of a preset bag-of-words model and a preset automatic coding model to obtain a first text feature (103); performing feature extraction on the target text to obtain second text features, wherein the second text features comprise a text complexity feature, a text syntax style feature and a medical semantic feature, and the feature extraction comprises calculating the number of each type of word, the ratio of each type of symbol and the ratio of each type of word (104); and performing evaluation processing on the first text feature and the second text features by means of a trained logistic regression model to obtain an evaluation result, wherein the evaluation result is used for identifying a content quality grade of the preset medical record text (105). The method also relates to blockchain technology, and the target text is stored in a blockchain.
IPC结构图谱:
G06F40/216 | 使用统计方法 |