会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 3. 发明授权
    • Methods and apparatuses for intra-document reference identification and resolution
    • 文件内参考识别和解析的方法和装置
    • US08352857B2
    • 2013-01-08
    • US12258627
    • 2008-10-27
    • Katja FilippovaHerve Dejean
    • Katja FilippovaHerve Dejean
    • G06F17/00
    • G06F17/2235
    • Reference identification and resolution identifies reference text fragments in a document and associates referenced object text fragments in the document with the identified reference text fragments. Reference profiles are abstracted from the document. Each reference profile specifies at least a reference number and an object type identifier. A reference profile is paired with an object text fragment of the document containing the reference number of the reference profile. The pairing is repeated to associate reference profiles with object text fragments. A reference text fragment of the document satisfying one of the reference profiles is associated with the object text fragment paired with the satisfied reference profile. The associating is repeated to associate reference text fragments of the document with object text fragments.
    • 参考标识和分辨率识别文档中的参考文本片段,并将文档中引用的对象文本片段与所标识的引用文本片段相关联。 参考资料从文件中抽象出来。 每个参考配置文件至少指定一个参考号和一个对象类型标识符。 参考资料与包含参考资料的参考编号的文件的对象文本片段配对。 重复配对以将参考简档与对象文本片段相关联。 满足一个参考简档的文档的参考文本片段与与满足的参考简档配对的对象文本片段相关联。 重复关联,将文档的引用文本片段与对象文本片段相关联。
    • 4. 发明授权
    • Method and apparatus for structuring documents utilizing recognition of an ordered sequence of identifiers
    • 用于利用识别标识符的有序序列的识别来构造文档的方法和装置
    • US07991709B2
    • 2011-08-02
    • US12020743
    • 2008-01-28
    • Herve DejeanJean-Luc Meunier
    • Herve DejeanJean-Luc Meunier
    • G06F15/18G06F17/21G06F17/22G06F17/27
    • G06F17/211
    • A method is provided for operating a computing device to create a document structure model of a computer parsable text document utilizing recognition of at least one ordered sequence of identifiers in the document. The method includes converting a computer parsable text document of any format to an alternative structured language format to form a converted document. The text of the converted document is fragmented into an ordered sequence of text fragments within a text format. The text fragments are enumerated to obtain a sequence of terms. At least one optimal sub-sequence of terms is identified from among the sequence of terms, with an optimal sub-sequence being one or more longest increasing sub-sequence(s). The computer parsable text document is annotated with tags, with the tags including information derived from identification of the optimal sub-sequence(s). The annotated document is displayed on the graphical user interface.
    • 提供了一种用于操作计算设备以利用文档中的至少一个有序序列的识别来创建计算机可解析文本文档的文档结构模型的方法。 该方法包括将任何格式的计算机可解析文本文档转换成替代结构化语言格式以形成转换的文档。 转换后的文档的文本被分割成文本格式的文本片段的有序序列。 枚举文本片段以获得术语序列。 从术语序列中识别术语的至少一个最佳子序列,其中最佳子序列是一个或多个最长增加子序列。 计算机可解析文本文档用标签注释,其中标签包括从最佳子序列的识别导出的信息。 注释文档显示在图形用户界面上。
    • 5. 发明授权
    • Systems and methods for converting legacy and proprietary documents into extended mark-up language format
    • 将传统和专有文档转换为扩展标记语言格式的系统和方法
    • US07165216B2
    • 2007-01-16
    • US10756313
    • 2004-01-14
    • Boris ChidlovskiiHerve Dejean
    • Boris ChidlovskiiHerve Dejean
    • G06F15/00
    • G06F17/30914G06F17/227
    • A system and method that converts legacy and proprietary documents into extended mark-up language format which treats the conversion as transforming ordered trees of one schema and/or model into ordered trees of another schema and/or model. In embodiments, the tree transformers are coded using a learning method that decomposes the converting task into three components which include path re-labeling, structural composition and input tree traversal, each of which involves learning approaches. The transformation of an input tree into an output tree may involve labeling components in the input tree with valid labels or paths from a particular output schema, composing the labeled elements into the output tree with a valid structure, and finding such a traversal of the input tree that achieves the correct composition of the output tree and applies structural rules.
    • 将传统和专有文档转换为扩展标记语言格式的系统和方法,该格式将转换视为将一个模式和/或模型的有序树转换为另一模式和/或模型的有序树。 在实施例中,使用将转换任务分解为包括路径重新标记,结构组合和输入树遍历的三个组件的学习方法对树型变换器进行编码,每个组件涉及学习方法。 将输入树转换为输出树可能涉及使用来自特定输出模式的有效标签或路径来标注输入树中的组件,使用有效结构将标记的元素组合成输出树,并且找到输入的遍历 树,实现输出树的正确组合并应用结构规则。
    • 7. 发明申请
    • TYPOGRAPHICAL BLOCK GENERATION
    • 柱形生成
    • US20130321867A1
    • 2013-12-05
    • US13484708
    • 2012-05-31
    • Herve Dejean
    • Herve Dejean
    • G06K15/02
    • G06F17/211
    • Embodiments of a computer-implemented method for grouping one or more token elements comprising one or more characters in an input file. The method comprises computing a first leading distance between a first baseline of a first token element, and a second baseline of a second token element. The method further comprises defining a block with the first token element and the second token element, and characterizing the first leading distance as a leading distance of the block. The method further comprises computing a second leading distance between the second baseline and a third baseline of a third token element. The method furthermore comprises, grouping the third token element in to the block based on a first difference between the second leading distance and the leading distance of the block lying within a first predefined threshold value.
    • 用于对包括输入文件中的一个或多个字符的一个或多个令牌元素进行分组的计算机实现的方法的实施例。 该方法包括计算第一令牌元素的第一基线和第二令牌元件的第二基线之间的第一前导距离。 该方法还包括使用第一令牌元素和第二令牌元素定义块,并且将第一前导距离表征为块的前导距离。 该方法还包括计算第三令牌元素的第二基线和第三基线之间的第二前导距离。 该方法还包括:基于位于第一预定阈值内的块的第二前导距离和前导距离之间的第一差异,将第三令牌元素分组到块中。
    • 8. 发明申请
    • METHODS AND APPARATUSES FOR INTRA-DOCUMENT REFERENCE IDENTIFICATION AND RESOLUTION
    • 文献参考标识和分辨率的方法和设备
    • US20100107045A1
    • 2010-04-29
    • US12258627
    • 2008-10-27
    • Katja FilippovaHerve Dejean
    • Katja FilippovaHerve Dejean
    • G06F17/00G06F17/21G06F17/30
    • G06F17/2235
    • Reference identification and resolution identifies reference text fragments in a document and associates referenced object text fragments in the document with the identified reference text fragments. Reference profiles are abstracted from the document. Each reference profile specifies at least a reference number and an object type identifier. A reference profile is paired with an object text fragment of the document containing the reference number of the reference profile. The pairing is repeated to associate reference profiles with object text fragments. A reference text fragment of the document satisfying one of the reference profiles is associated with the object text fragment paired with the satisfied reference profile. The associating is repeated to associate reference text fragments of the document with object text fragments.
    • 参考标识和分辨率识别文档中的参考文本片段,并将文档中引用的对象文本片段与所标识的引用文本片段相关联。 参考资料从文件中抽象出来。 每个参考配置文件至少指定一个参考号和一个对象类型标识符。 参考资料与包含参考资料的参考编号的文件的对象文本片段配对。 重复配对以将参考简档与对象文本片段相关联。 满足一个参考简档的文档的参考文本片段与与满足的参考简档配对的对象文本片段相关联。 重复关联,将文档的引用文本片段与对象文本片段相关联。
    • 9. 发明申请
    • Versatile page number detector
    • 多功能页码检测器
    • US20080114757A1
    • 2008-05-15
    • US11599947
    • 2006-11-15
    • Herve DejeanJean-Luc Meunier
    • Herve DejeanJean-Luc Meunier
    • G06F7/20G06F17/30
    • G06F17/30569G06K9/00469
    • A method for detection of page numbers in a document includes identifying a plurality of text fragments associated with a plurality of pages of a document. From the identified text fragments, at least one sequence is identified. Each identified sequence includes a plurality of terms. Each term of the sequence is derived from a text fragment selected from the plurality text fragments. The terms of an identified sequence comply with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. A subset of the identified sequences which cover at least some of the pages of the document is computed. Terms of at least some of the subset of the identified sequences are construed as page numbers of pages of the document. Additional page numbers may be identified by considering one or more features of the terms in the subset of identified sequences.
    • 用于检测文档中的页码的方法包括识别与文档的多个页面相关联的多个文本片段。 从识别的文本片段中,至少识别出一个序列。 每个识别的序列包括多个术语。 序列的每个术语从选自多个文本片段的文本片段导出。 所识别序列的术语符合至少一个定义序列中术语的形式和增量状态的预定义编号方案。 计算覆盖文档的至少一些页面的识别序列的子集。 所识别的序列的至少一部分子集的术语被解释为文档的页面页码。 可以通过考虑所识别序列的子集中的术语的一个或多个特征来识别附加页码。