会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 11. 发明申请
    • Structuring document based on table of contents
    • 根据目录构建文档
    • US20060248070A1
    • 2006-11-02
    • US11116100
    • 2005-04-27
    • Herve DejeanJean-Luc Meunier
    • Herve DejeanJean-Luc Meunier
    • G06F17/00
    • G06F17/30616G06F17/2241
    • A document is organized as a plurality of nodes associated with a table of contents. The nodes are clustered into a plurality of clusters based on a similarity criterion. One of the clusters is identified as corresponding to a highest or lowest level of the table of contents based on a selection criterion. The highest or lowest level is assigned to the nodes belonging to the identified cluster. The identifying and assigning are repeated to assign levels to the nodes belonging to each next highest or lowest level of the table of contents. The repeated identifying is based on the selection criteria applied disregarding nodes that have already been assigned a level. The document is structured based at least in part on the levels assigned to the table of contents nodes.
    • 文档被组织为与内容表相关联的多个节点。 基于相似性标准,将节点聚类成多个聚类。 基于选择标准,将一个集群识别为对应于内容表的最高或最低级别。 最高或最低级别被分配给属于所识别的集群的节点。 重复识别和分配以将属性分配给属于内容表的每个下一个最高或最低级别的节点。 重复的识别是基于应用的选择标准,而不考虑已经被分配了一个级别的节点。 该文档至少部分地基于分配给目录节点的级别而被构造。
    • 12. 发明申请
    • METHOD AND APPARATUS FOR STRUCTURING DOCUMENTS UTILIZING RECOGNITION OF AN ORDERED SEQUENCE OF IDENTIFIERS
    • 用于结构化文档的方法和装置,使用标识符的顺序序列的识别
    • US20090192956A1
    • 2009-07-30
    • US12020743
    • 2008-01-28
    • Herve DejeanJean-Luc Meunier
    • Herve DejeanJean-Luc Meunier
    • G06F17/27G06F15/18
    • G06F17/211
    • A method is provided for operating a computing device to create a document structure model of a computer parsable text document utilizing recognition of at least one ordered sequence of identifiers in the document. The method includes converting a computer parsable text document of any format to an alternative structured language format to form a converted document. The text of the converted document is fragmented into an ordered sequence of text fragments within a text format. The text fragments are enumerated to obtain a sequence of terms. At least one optimal sub-sequence of terms is identified from among the sequence of terms, with an optimal sub-sequence being one or more longest increasing sub-sequence(s). The computer parsable text document is annotated with tags, with the tags including information derived from identification of the optimal sub-sequence(s). The annotated document is displayed on the graphical user interface.
    • 提供了一种用于操作计算设备以利用文档中的至少一个有序序列的识别来创建计算机可解析文本文档的文档结构模型的方法。 该方法包括将任何格式的计算机可解析文本文档转换成替代结构化语言格式以形成转换的文档。 转换后的文档的文本被分割成文本格式的文本片段的有序序列。 枚举文本片段以获得术语序列。 从术语序列中识别术语的至少一个最佳子序列,其中最佳子序列是一个或多个最长增加子序列。 计算机可解析文本文档用标签注释,其中标签包括从最佳子序列的识别导出的信息。 注释文档显示在图形用户界面上。
    • 15. 发明授权
    • Methods and apparatuses for intra-document reference identification and resolution
    • 文件内参考识别和解析的方法和装置
    • US08352857B2
    • 2013-01-08
    • US12258627
    • 2008-10-27
    • Katja FilippovaHerve Dejean
    • Katja FilippovaHerve Dejean
    • G06F17/00
    • G06F17/2235
    • Reference identification and resolution identifies reference text fragments in a document and associates referenced object text fragments in the document with the identified reference text fragments. Reference profiles are abstracted from the document. Each reference profile specifies at least a reference number and an object type identifier. A reference profile is paired with an object text fragment of the document containing the reference number of the reference profile. The pairing is repeated to associate reference profiles with object text fragments. A reference text fragment of the document satisfying one of the reference profiles is associated with the object text fragment paired with the satisfied reference profile. The associating is repeated to associate reference text fragments of the document with object text fragments.
    • 参考标识和分辨率识别文档中的参考文本片段,并将文档中引用的对象文本片段与所标识的引用文本片段相关联。 参考资料从文件中抽象出来。 每个参考配置文件至少指定一个参考号和一个对象类型标识符。 参考资料与包含参考资料的参考编号的文件的对象文本片段配对。 重复配对以将参考简档与对象文本片段相关联。 满足一个参考简档的文档的参考文本片段与与满足的参考简档配对的对象文本片段相关联。 重复关联,将文档的引用文本片段与对象文本片段相关联。
    • 16. 发明授权
    • Systems and methods for converting legacy and proprietary documents into extended mark-up language format
    • 将传统和专有文档转换为扩展标记语言格式的系统和方法
    • US07165216B2
    • 2007-01-16
    • US10756313
    • 2004-01-14
    • Boris ChidlovskiiHerve Dejean
    • Boris ChidlovskiiHerve Dejean
    • G06F15/00
    • G06F17/30914G06F17/227
    • A system and method that converts legacy and proprietary documents into extended mark-up language format which treats the conversion as transforming ordered trees of one schema and/or model into ordered trees of another schema and/or model. In embodiments, the tree transformers are coded using a learning method that decomposes the converting task into three components which include path re-labeling, structural composition and input tree traversal, each of which involves learning approaches. The transformation of an input tree into an output tree may involve labeling components in the input tree with valid labels or paths from a particular output schema, composing the labeled elements into the output tree with a valid structure, and finding such a traversal of the input tree that achieves the correct composition of the output tree and applies structural rules.
    • 将传统和专有文档转换为扩展标记语言格式的系统和方法,该格式将转换视为将一个模式和/或模型的有序树转换为另一模式和/或模型的有序树。 在实施例中,使用将转换任务分解为包括路径重新标记,结构组合和输入树遍历的三个组件的学习方法对树型变换器进行编码,每个组件涉及学习方法。 将输入树转换为输出树可能涉及使用来自特定输出模式的有效标签或路径来标注输入树中的组件,使用有效结构将标记的元素组合成输出树,并且找到输入的遍历 树,实现输出树的正确组合并应用结构规则。
    • 17. 发明申请
    • TYPOGRAPHICAL BLOCK GENERATION
    • 柱形生成
    • US20130321867A1
    • 2013-12-05
    • US13484708
    • 2012-05-31
    • Herve Dejean
    • Herve Dejean
    • G06K15/02
    • G06F17/211
    • Embodiments of a computer-implemented method for grouping one or more token elements comprising one or more characters in an input file. The method comprises computing a first leading distance between a first baseline of a first token element, and a second baseline of a second token element. The method further comprises defining a block with the first token element and the second token element, and characterizing the first leading distance as a leading distance of the block. The method further comprises computing a second leading distance between the second baseline and a third baseline of a third token element. The method furthermore comprises, grouping the third token element in to the block based on a first difference between the second leading distance and the leading distance of the block lying within a first predefined threshold value.
    • 用于对包括输入文件中的一个或多个字符的一个或多个令牌元素进行分组的计算机实现的方法的实施例。 该方法包括计算第一令牌元素的第一基线和第二令牌元件的第二基线之间的第一前导距离。 该方法还包括使用第一令牌元素和第二令牌元素定义块,并且将第一前导距离表征为块的前导距离。 该方法还包括计算第三令牌元素的第二基线和第三基线之间的第二前导距离。 该方法还包括:基于位于第一预定阈值内的块的第二前导距离和前导距离之间的第一差异,将第三令牌元素分组到块中。
    • 18. 发明申请
    • METHODS AND APPARATUSES FOR INTRA-DOCUMENT REFERENCE IDENTIFICATION AND RESOLUTION
    • 文献参考标识和分辨率的方法和设备
    • US20100107045A1
    • 2010-04-29
    • US12258627
    • 2008-10-27
    • Katja FilippovaHerve Dejean
    • Katja FilippovaHerve Dejean
    • G06F17/00G06F17/21G06F17/30
    • G06F17/2235
    • Reference identification and resolution identifies reference text fragments in a document and associates referenced object text fragments in the document with the identified reference text fragments. Reference profiles are abstracted from the document. Each reference profile specifies at least a reference number and an object type identifier. A reference profile is paired with an object text fragment of the document containing the reference number of the reference profile. The pairing is repeated to associate reference profiles with object text fragments. A reference text fragment of the document satisfying one of the reference profiles is associated with the object text fragment paired with the satisfied reference profile. The associating is repeated to associate reference text fragments of the document with object text fragments.
    • 参考标识和分辨率识别文档中的参考文本片段,并将文档中引用的对象文本片段与所标识的引用文本片段相关联。 参考资料从文件中抽象出来。 每个参考配置文件至少指定一个参考号和一个对象类型标识符。 参考资料与包含参考资料的参考编号的文件的对象文本片段配对。 重复配对以将参考简档与对象文本片段相关联。 满足一个参考简档的文档的参考文本片段与与满足的参考简档配对的对象文本片段相关联。 重复关联,将文档的引用文本片段与对象文本片段相关联。
    • 20. 发明申请
    • Method and apparatus for structuring documents based on layout, content and collection
    • 基于布局,内容和收集构建文档的方法和装置
    • US20060155700A1
    • 2006-07-13
    • US11033016
    • 2005-01-10
    • Herve DejeanVeronika LuxSandrine Ribeau
    • Herve DejeanVeronika LuxSandrine Ribeau
    • G06F17/24
    • G06F17/30914
    • A method and apparatus is provided for converting a document in a first format essentially comprising a flat layout structure into a structured document in a hierarchical form in accordance with predetermined attributes identified from the input format. The process comprises fragmenting the input document into a plurality of document content elements in accordance with a predetermined set of document attributes identifiable from the input document format. The content elements are clustered into selective sets having similar document attributes. The clustered sets are validated with reference to common textual properties organizational content common in documents in the collection. The clustered sets are then categorized into predetermined categories comprising structured elements of the structured document format and the document content elements are organized by hierarchical dependency from the predetermined categories wherein the organized document elements comprise the desired structured document format.
    • 提供了一种方法和装置,用于根据从输入格式识别的预定属性将基本上包括平面布局结构的第一格式的文档以分层形式转换成结构化文档。 该过程包括根据从输入文档格式可识别的预定文档属性集,将输入文档分段成多个文档内容元素。 内容元素被聚集成具有相似文档属性的选择集。 参考集合中的文档中常见的常见文本属性组织内容来验证集群集。 然后,将集群集合分类为包括结构化文档格式的结构化元素的预定类别,并且文档内容元素由来自预定类别的分层依赖性组织,其中组织的文档元素包括期望的结构化文档格式。