会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Method and apparatus for structuring documents utilizing recognition of an ordered sequence of identifiers
    • 用于利用识别标识符的有序序列的识别来构造文档的方法和装置
    • US07991709B2
    • 2011-08-02
    • US12020743
    • 2008-01-28
    • Herve DejeanJean-Luc Meunier
    • Herve DejeanJean-Luc Meunier
    • G06F15/18G06F17/21G06F17/22G06F17/27
    • G06F17/211
    • A method is provided for operating a computing device to create a document structure model of a computer parsable text document utilizing recognition of at least one ordered sequence of identifiers in the document. The method includes converting a computer parsable text document of any format to an alternative structured language format to form a converted document. The text of the converted document is fragmented into an ordered sequence of text fragments within a text format. The text fragments are enumerated to obtain a sequence of terms. At least one optimal sub-sequence of terms is identified from among the sequence of terms, with an optimal sub-sequence being one or more longest increasing sub-sequence(s). The computer parsable text document is annotated with tags, with the tags including information derived from identification of the optimal sub-sequence(s). The annotated document is displayed on the graphical user interface.
    • 提供了一种用于操作计算设备以利用文档中的至少一个有序序列的识别来创建计算机可解析文本文档的文档结构模型的方法。 该方法包括将任何格式的计算机可解析文本文档转换成替代结构化语言格式以形成转换的文档。 转换后的文档的文本被分割成文本格式的文本片段的有序序列。 枚举文本片段以获得术语序列。 从术语序列中识别术语的至少一个最佳子序列,其中最佳子序列是一个或多个最长增加子序列。 计算机可解析文本文档用标签注释,其中标签包括从最佳子序列的识别导出的信息。 注释文档显示在图形用户界面上。
    • 3. 发明申请
    • Versatile page number detector
    • 多功能页码检测器
    • US20080114757A1
    • 2008-05-15
    • US11599947
    • 2006-11-15
    • Herve DejeanJean-Luc Meunier
    • Herve DejeanJean-Luc Meunier
    • G06F7/20G06F17/30
    • G06F17/30569G06K9/00469
    • A method for detection of page numbers in a document includes identifying a plurality of text fragments associated with a plurality of pages of a document. From the identified text fragments, at least one sequence is identified. Each identified sequence includes a plurality of terms. Each term of the sequence is derived from a text fragment selected from the plurality text fragments. The terms of an identified sequence comply with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. A subset of the identified sequences which cover at least some of the pages of the document is computed. Terms of at least some of the subset of the identified sequences are construed as page numbers of pages of the document. Additional page numbers may be identified by considering one or more features of the terms in the subset of identified sequences.
    • 用于检测文档中的页码的方法包括识别与文档的多个页面相关联的多个文本片段。 从识别的文本片段中,至少识别出一个序列。 每个识别的序列包括多个术语。 序列的每个术语从选自多个文本片段的文本片段导出。 所识别序列的术语符合至少一个定义序列中术语的形式和增量状态的预定义编号方案。 计算覆盖文档的至少一些页面的识别序列的子集。 所识别的序列的至少一部分子集的术语被解释为文档的页面页码。 可以通过考虑所识别序列的子集中的术语的一个或多个特征来识别附加页码。
    • 9. 发明授权
    • Structuring document based on table of contents
    • 根据目录构建文档
    • US08302002B2
    • 2012-10-30
    • US11116100
    • 2005-04-27
    • Herve DejeanJean-Luc Meunier
    • Herve DejeanJean-Luc Meunier
    • G06F17/00
    • G06F17/30616G06F17/2241
    • A document is organized as a plurality of nodes associated with a table of contents. The nodes are clustered into a plurality of clusters based on a similarity criterion. One of the clusters is identified as corresponding to a highest or lowest level of the table of contents based on a selection criterion. The highest or lowest level is assigned to the nodes belonging to the identified cluster. The identifying and assigning are repeated to assign levels to the nodes belonging to each next highest or lowest level of the table of contents. The repeated identifying is based on the selection criteria applied disregarding nodes that have already been assigned a level. The document is structured based at least in part on the levels assigned to the table of contents nodes.
    • 文档被组织为与内容表相关联的多个节点。 基于相似性标准,将节点聚类成多个聚类。 基于选择标准,将一个集群识别为对应于内容表的最高或最低级别。 最高或最低级别被分配给属于所识别的集群的节点。 重复识别和分配以将属性分配给属于内容表的每个下一个最高或最低级别的节点。 重复的识别是基于应用的选择标准,而不考虑已经被分配了一个级别的节点。 该文档至少部分地基于分配给目录节点的级别而被构造。
    • 10. 发明申请
    • Table of contents extraction with improved robustness
    • 目录提取具有改进的鲁棒性
    • US20070196015A1
    • 2007-08-23
    • US11360963
    • 2006-02-23
    • Jean-Luc MeunierHerve Dejean
    • Jean-Luc MeunierHerve Dejean
    • G06K9/46G06F7/00G06K9/34G06F17/00
    • G06F17/2745
    • In a method for identifying a table of contents in a document (10), text fragments are extracted (12) from the document. There are identified (20, 30, 34, 38): (i) a substantially contiguous group of text fragments as table of content entries and (ii) a different group of text fragments as linked text fragments linked with corresponding table of content entries. During the identifying, a number of text fragments that are candidates for identification as linked text fragments is reduced based on at least one reduction criterion (130). The identified table of contents entries and linked text fragments (110) are validated based on at least one validation criterion (162) related to distribution of the linked text fragments.
    • 在用于识别文档(10)中的目录的方法中,从文档中提取文本片段(12)。 确定(20,30,34,38):(i)作为内容条目表的基本连续的文本片段组,以及(ii)与相应的内容条目链接的链接的文本片段的不同的文本片段组。 在识别期间,基于至少一个简化标准(130),减少作为链接文本片段的识别的候选者的多个文本片段。 基于与链接的文本片段的分布相关的至少一个验证标准(162),验证所识别的目录条目和链接的文本片段(110)。