会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Versatile page number detector
    • 多功能页码检测器
    • US07797622B2
    • 2010-09-14
    • US11599947
    • 2006-11-15
    • Hervé DéjeanJean-Luc Meunier
    • Hervé DéjeanJean-Luc Meunier
    • G06F17/27
    • G06F17/30569G06K9/00469
    • A method for detection of page numbers in a document includes identifying a plurality of text fragments associated with a plurality of pages of a document. From the identified text fragments, at least one sequence is identified. Each identified sequence includes a plurality of terms. Each term of the sequence is derived from a text fragment selected from the plurality text fragments. The terms of an identified sequence comply with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. A subset of the identified sequences which cover at least some of the pages of the document is computed. Terms of at least some of the subset of the identified sequences are construed as page numbers of pages of the document. Additional page numbers may be identified by considering one or more features of the terms in the subset of identified sequences.
    • 用于检测文档中的页码的方法包括识别与文档的多个页面相关联的多个文本片段。 从识别的文本片段中,至少识别出一个序列。 每个识别的序列包括多个术语。 序列的每个术语从选自多个文本片段的文本片段导出。 所识别序列的术语符合至少一个定义序列中术语的形式和增量状态的预定义编号方案。 计算覆盖文档的至少一些页面的识别序列的子集。 所识别的序列的至少一部分子集的术语被解释为文档的页面页码。 可以通过考虑所识别序列的子集中的术语的一个或多个特征来识别附加页码。
    • 3. 发明授权
    • Table of contents extraction with improved robustness
    • 目录提取具有改进的鲁棒性
    • US07743327B2
    • 2010-06-22
    • US11360963
    • 2006-02-23
    • Jean-Luc MeunierHervé Déjean
    • Jean-Luc MeunierHervé Déjean
    • G06F17/21
    • G06F17/2745
    • In a method for identifying a table of contents in a document (10), text fragments are extracted (12) from the document. There are identified (20, 30, 34, 38): (i) a substantially contiguous group of text fragments as table of content entries and (ii) a different group of text fragments as linked text fragments linked with corresponding table of content entries. During the identifying, a number of text fragments that are candidates for identification as linked text fragments is reduced based on at least one reduction criterion (130). The identified table of contents entries and linked text fragments (110) are validated based on at least one validation criterion (162) related to distribution of the linked text fragments.
    • 在用于识别文档(10)中的目录的方法中,从文档中提取文本片段(12)。 确定(20,30,34,38):(i)作为内容条目表的基本连续的文本片段组,以及(ii)与相应的内容条目链接的链接的文本片段的不同的文本片段组。 在识别期间,基于至少一个简化标准(130),减少作为链接文本片段的识别的候选者的多个文本片段。 基于与链接的文本片段的分布相关的至少一个验证标准(162),验证所识别的目录条目和链接的文本片段(110)。
    • 7. 发明申请
    • Versatile page number detector
    • 多功能页码检测器
    • US20080114757A1
    • 2008-05-15
    • US11599947
    • 2006-11-15
    • Herve DejeanJean-Luc Meunier
    • Herve DejeanJean-Luc Meunier
    • G06F7/20G06F17/30
    • G06F17/30569G06K9/00469
    • A method for detection of page numbers in a document includes identifying a plurality of text fragments associated with a plurality of pages of a document. From the identified text fragments, at least one sequence is identified. Each identified sequence includes a plurality of terms. Each term of the sequence is derived from a text fragment selected from the plurality text fragments. The terms of an identified sequence comply with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. A subset of the identified sequences which cover at least some of the pages of the document is computed. Terms of at least some of the subset of the identified sequences are construed as page numbers of pages of the document. Additional page numbers may be identified by considering one or more features of the terms in the subset of identified sequences.
    • 用于检测文档中的页码的方法包括识别与文档的多个页面相关联的多个文本片段。 从识别的文本片段中,至少识别出一个序列。 每个识别的序列包括多个术语。 序列的每个术语从选自多个文本片段的文本片段导出。 所识别序列的术语符合至少一个定义序列中术语的形式和增量状态的预定义编号方案。 计算覆盖文档的至少一些页面的识别序列的子集。 所识别的序列的至少一部分子集的术语被解释为文档的页面页码。 可以通过考虑所识别序列的子集中的术语的一个或多个特征来识别附加页码。