发明申请
US20120304055A1 DOCUMENT ANALYSIS APPARATUS, DOCUMENT ANALYSIS METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
有权
基本信息:
- 专利标题: DOCUMENT ANALYSIS APPARATUS, DOCUMENT ANALYSIS METHOD, AND COMPUTER-READABLE RECORDING MEDIUM
- 专利标题(中):文件分析装置,文件分析方法和计算机可读记录介质
- 申请号:US13576669 申请日:2011-01-25
- 公开(公告)号:US20120304055A1 公开(公告)日:2012-11-29
- 发明人: Satoshi Nakazawa , Shinichi Ando
- 申请人: Satoshi Nakazawa , Shinichi Ando
- 申请人地址: JP Tokyo
- 专利权人: NEC CORPORATION
- 当前专利权人: NEC CORPORATION
- 当前专利权人地址: JP Tokyo
- 优先权: JP2010-029392 20100212
- 国际申请: PCT/JP2011/051277 WO 20110125
- 主分类号: G06F17/24
- IPC分类号: G06F17/24
摘要:
A document analysis apparatus comprises: a feature expression acquisition unit acquiring a feature expression appearing during an attention period in an analysis object document collection; a document collection acquisition unit acquiring a feature expression containing document (FECD) collection in which a feature expression appears, from an analysis population including an analysis object document collection; a context determination unit specifying an analysis/FECD corresponding to an analysis object document among a FECD collection for every feature expression, and specifies a context in which the feature expression appeared in multiple analysis/FECDs; a context comparison determination unit specifying a non analysis/FECD not corresponding to an analysis object document among a FECD collection, and within that, compares a context in which the feature expression has appeared and a context specified previously; and a feature degree setting unit performing giving or the like of a feature degree to a feature expression from the comparison.
摘要(中):
文件分析装置包括:特征表达获取单元,获取在分析对象文档收集期间在注意期间出现的特征表达; 文档收集获取单元从包括分析对象文档集合的分析群体获取包含其中出现特征表达的文档(FECD)集合的特征表达式; 指定对于每个特征表达式的FECD集合中的与分析对象文档对应的分析/ FECD的上下文确定单元,并且指定在多个分析/ FECD中出现特征表达的上下文; 指定在FECD集合中与分析对象文档不对应的非分析/ FECD的上下文比较确定单元,并且在其中比较特征表达式出现的上下文和先前指定的上下文; 以及特征度设定单元,对来自所述比较的特征表达进行特征度的赋予等。