专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US07620539B2 Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing 有权
标题翻译：使用几何加工识别可比语料库中的双语词典的方法和装置
公开(公告)号：US07620539B2
公开(公告)日：2009-11-17
申请号：US10976847
申请日：2004-11-01
申请人： Eric Gaussier , Jean-Michel Renders , Herve Dejean , Cyril Goutte , Irina Matveeva
发明人： Eric Gaussier , Jean-Michel Renders , Herve Dejean , Cyril Goutte , Irina Matveeva
IPC分类号： G06F17/28 , G06F17/20 , G06F17/27 , G06F17/21 , G06F17/30
CPC分类号： G06F17/2785 , G06F17/2735 , G06F17/2827 , Y10S707/99937
摘要： Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.
摘要翻译：公开了使用双语字典使用几何解释来识别可比语料库中的双语对的各种方法。这些方法可以单独使用或组合使用来计算双语对之间的相似度。

2. 发明申请

US20060009963A1 Method and apparatus for identifying bilingual lexicons in comparable corpora 有权
标题翻译：用于识别可比语料库中的双语词典的方法和装置
公开(公告)号：US20060009963A1
公开(公告)日：2006-01-12
申请号：US10976847
申请日：2004-11-01
申请人： Eric Gaussier , Jean-Michel Renders , Herve Dejean , Cyril Goutte , Irina Matveeva
发明人： Eric Gaussier , Jean-Michel Renders , Herve Dejean , Cyril Goutte , Irina Matveeva
IPC分类号： G06F17/28
CPC分类号： G06F17/2785 , G06F17/2735 , G06F17/2827 , Y10S707/99937
摘要： Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.
摘要翻译：公开了使用双语字典使用几何解释来识别可比语料库中的双语对的各种方法。这些方法可以单独使用或组合使用来计算双语对之间的相似度。

3. 发明授权

US08352857B2 Methods and apparatuses for intra-document reference identification and resolution 有权
标题翻译：文件内参考识别和解析的方法和装置
公开(公告)号：US08352857B2
公开(公告)日：2013-01-08
申请号：US12258627
申请日：2008-10-27
申请人： Katja Filippova , Herve Dejean
发明人： Katja Filippova , Herve Dejean
IPC分类号： G06F17/00
CPC分类号： G06F17/2235
摘要： Reference identification and resolution identifies reference text fragments in a document and associates referenced object text fragments in the document with the identified reference text fragments. Reference profiles are abstracted from the document. Each reference profile specifies at least a reference number and an object type identifier. A reference profile is paired with an object text fragment of the document containing the reference number of the reference profile. The pairing is repeated to associate reference profiles with object text fragments. A reference text fragment of the document satisfying one of the reference profiles is associated with the object text fragment paired with the satisfied reference profile. The associating is repeated to associate reference text fragments of the document with object text fragments.
摘要翻译：参考标识和分辨率识别文档中的参考文本片段，并将文档中引用的对象文本片段与所标识的引用文本片段相关联。参考资料从文件中抽象出来。每个参考配置文件至少指定一个参考号和一个对象类型标识符。参考资料与包含参考资料的参考编号的文件的对象文本片段配对。重复配对以将参考简档与对象文本片段相关联。满足一个参考简档的文档的参考文本片段与与满足的参考简档配对的对象文本片段相关联。重复关联，将文档的引用文本片段与对象文本片段相关联。

4. 发明授权

US07991709B2 Method and apparatus for structuring documents utilizing recognition of an ordered sequence of identifiers 有权
标题翻译：用于利用识别标识符的有序序列的识别来构造文档的方法和装置
公开(公告)号：US07991709B2
公开(公告)日：2011-08-02
申请号：US12020743
申请日：2008-01-28
申请人： Herve Dejean , Jean-Luc Meunier
发明人： Herve Dejean , Jean-Luc Meunier
IPC分类号： G06F15/18 , G06F17/21 , G06F17/22 , G06F17/27
CPC分类号： G06F17/211
摘要： A method is provided for operating a computing device to create a document structure model of a computer parsable text document utilizing recognition of at least one ordered sequence of identifiers in the document. The method includes converting a computer parsable text document of any format to an alternative structured language format to form a converted document. The text of the converted document is fragmented into an ordered sequence of text fragments within a text format. The text fragments are enumerated to obtain a sequence of terms. At least one optimal sub-sequence of terms is identified from among the sequence of terms, with an optimal sub-sequence being one or more longest increasing sub-sequence(s). The computer parsable text document is annotated with tags, with the tags including information derived from identification of the optimal sub-sequence(s). The annotated document is displayed on the graphical user interface.
摘要翻译：提供了一种用于操作计算设备以利用文档中的至少一个有序序列的识别来创建计算机可解析文本文档的文档结构模型的方法。该方法包括将任何格式的计算机可解析文本文档转换成替代结构化语言格式以形成转换的文档。转换后的文档的文本被分割成文本格式的文本片段的有序序列。枚举文本片段以获得术语序列。从术语序列中识别术语的至少一个最佳子序列，其中最佳子序列是一个或多个最长增加子序列。计算机可解析文本文档用标签注释，其中标签包括从最佳子序列的识别导出的信息。注释文档显示在图形用户界面上。

5. 发明授权

US07165216B2 Systems and methods for converting legacy and proprietary documents into extended mark-up language format 失效
标题翻译：将传统和专有文档转换为扩展标记语言格式的系统和方法
公开(公告)号：US07165216B2
公开(公告)日：2007-01-16
申请号：US10756313
申请日：2004-01-14
申请人： Boris Chidlovskii , Herve Dejean
发明人： Boris Chidlovskii , Herve Dejean
IPC分类号： G06F15/00
CPC分类号： G06F17/30914 , G06F17/227
摘要： A system and method that converts legacy and proprietary documents into extended mark-up language format which treats the conversion as transforming ordered trees of one schema and/or model into ordered trees of another schema and/or model. In embodiments, the tree transformers are coded using a learning method that decomposes the converting task into three components which include path re-labeling, structural composition and input tree traversal, each of which involves learning approaches. The transformation of an input tree into an output tree may involve labeling components in the input tree with valid labels or paths from a particular output schema, composing the labeled elements into the output tree with a valid structure, and finding such a traversal of the input tree that achieves the correct composition of the output tree and applies structural rules.
摘要翻译：将传统和专有文档转换为扩展标记语言格式的系统和方法，该格式将转换视为将一个模式和/或模型的有序树转换为另一模式和/或模型的有序树。在实施例中，使用将转换任务分解为包括路径重新标记，结构组合和输入树遍历的三个组件的学习方法对树型变换器进行编码，每个组件涉及学习方法。将输入树转换为输出树可能涉及使用来自特定输出模式的有效标签或路径来标注输入树中的组件，使用有效结构将标记的元素组合成输出树，并且找到输入的遍历树，实现输出树的正确组合并应用结构规则。

6. 发明申请

US20060156226A1 Method and apparatus for detecting pagination constructs including a header and a footer in legacy documents 有权
公开(公告)号：US20060156226A1
公开(公告)日：2006-07-13
申请号：US11032817
申请日：2005-01-10
申请人： Herve Dejean , Jean-Luc Meunier
发明人： Herve Dejean , Jean-Luc Meunier
IPC分类号： G06F17/00 , G06F17/21
CPC分类号： G06F17/217 , G06F17/2745
摘要： A method for identifying header/footer content of a document, in order to sequence text fragments comprising recognizable text blocks as derived from the document. The textual variability of lines comprised of text blocks, including the different kinds of text blocks within the line is analyzed for assessment of textual variability. Header/footer zones are defined by textual content having a low textual variability. An alternative embodiment identifies pagination constructs by comparing selected text-boxes for similarity and proximity and clustering the text boxes satisfying a predetermined similarity value, wherein the clustered text boxes are deemed to comprise pagination constructs.

7. 发明申请

US20130321867A1 TYPOGRAPHICAL BLOCK GENERATION 审中-公开
标题翻译：柱形生成
公开(公告)号：US20130321867A1
公开(公告)日：2013-12-05
申请号：US13484708
申请日：2012-05-31
申请人： Herve Dejean
发明人： Herve Dejean
IPC分类号： G06K15/02
CPC分类号： G06F17/211
摘要： Embodiments of a computer-implemented method for grouping one or more token elements comprising one or more characters in an input file. The method comprises computing a first leading distance between a first baseline of a first token element, and a second baseline of a second token element. The method further comprises defining a block with the first token element and the second token element, and characterizing the first leading distance as a leading distance of the block. The method further comprises computing a second leading distance between the second baseline and a third baseline of a third token element. The method furthermore comprises, grouping the third token element in to the block based on a first difference between the second leading distance and the leading distance of the block lying within a first predefined threshold value.
摘要翻译：用于对包括输入文件中的一个或多个字符的一个或多个令牌元素进行分组的计算机实现的方法的实施例。该方法包括计算第一令牌元素的第一基线和第二令牌元件的第二基线之间的第一前导距离。该方法还包括使用第一令牌元素和第二令牌元素定义块，并且将第一前导距离表征为块的前导距离。该方法还包括计算第三令牌元素的第二基线和第三基线之间的第二前导距离。该方法还包括：基于位于第一预定阈值内的块的第二前导距离和前导距离之间的第一差异，将第三令牌元素分组到块中。

8. 发明申请

US20100107045A1 METHODS AND APPARATUSES FOR INTRA-DOCUMENT REFERENCE IDENTIFICATION AND RESOLUTION 有权
标题翻译：文献参考标识和分辨率的方法和设备
公开(公告)号：US20100107045A1
公开(公告)日：2010-04-29
申请号：US12258627
申请日：2008-10-27
申请人： Katja Filippova , Herve Dejean
发明人： Katja Filippova , Herve Dejean
IPC分类号： G06F17/00 , G06F17/21 , G06F17/30
CPC分类号： G06F17/2235
摘要： Reference identification and resolution identifies reference text fragments in a document and associates referenced object text fragments in the document with the identified reference text fragments. Reference profiles are abstracted from the document. Each reference profile specifies at least a reference number and an object type identifier. A reference profile is paired with an object text fragment of the document containing the reference number of the reference profile. The pairing is repeated to associate reference profiles with object text fragments. A reference text fragment of the document satisfying one of the reference profiles is associated with the object text fragment paired with the satisfied reference profile. The associating is repeated to associate reference text fragments of the document with object text fragments.
摘要翻译：参考标识和分辨率识别文档中的参考文本片段，并将文档中引用的对象文本片段与所标识的引用文本片段相关联。参考资料从文件中抽象出来。每个参考配置文件至少指定一个参考号和一个对象类型标识符。参考资料与包含参考资料的参考编号的文件的对象文本片段配对。重复配对以将参考简档与对象文本片段相关联。满足一个参考简档的文档的参考文本片段与与满足的参考简档配对的对象文本片段相关联。重复关联，将文档的引用文本片段与对象文本片段相关联。

9. 发明申请

US20080114757A1 Versatile page number detector 有权
标题翻译：多功能页码检测器
公开(公告)号：US20080114757A1
公开(公告)日：2008-05-15
申请号：US11599947
申请日：2006-11-15
申请人： Herve Dejean , Jean-Luc Meunier
发明人： Herve Dejean , Jean-Luc Meunier
IPC分类号： G06F7/20 , G06F17/30
CPC分类号： G06F17/30569 , G06K9/00469
摘要： A method for detection of page numbers in a document includes identifying a plurality of text fragments associated with a plurality of pages of a document. From the identified text fragments, at least one sequence is identified. Each identified sequence includes a plurality of terms. Each term of the sequence is derived from a text fragment selected from the plurality text fragments. The terms of an identified sequence comply with at least one predefined numbering scheme which defines a form and an incremental state of the terms in a sequence. A subset of the identified sequences which cover at least some of the pages of the document is computed. Terms of at least some of the subset of the identified sequences are construed as page numbers of pages of the document. Additional page numbers may be identified by considering one or more features of the terms in the subset of identified sequences.
摘要翻译：用于检测文档中的页码的方法包括识别与文档的多个页面相关联的多个文本片段。从识别的文本片段中，至少识别出一个序列。每个识别的序列包括多个术语。序列的每个术语从选自多个文本片段的文本片段导出。所识别序列的术语符合至少一个定义序列中术语的形式和增量状态的预定义编号方案。计算覆盖文档的至少一些页面的识别序列的子集。所识别的序列的至少一部分子集的术语被解释为文档的页面页码。可以通过考虑所识别序列的子集中的术语的一个或多个特征来识别附加页码。

10. 发明申请

US20080077847A1 Captions detector 有权
标题翻译：字幕检测器
公开(公告)号：US20080077847A1
公开(公告)日：2008-03-27
申请号：US11528261
申请日：2006-09-27
申请人： Herve Dejean
发明人： Herve Dejean
IPC分类号： G06F17/00 , G06F17/20 , G06F17/21 , G06F17/22 , G06F17/24 , G06F17/25 , G06F17/26 , G06F17/27 , G06F17/28
CPC分类号： G06F17/2745
摘要： To detect captions in a document that includes text fragments and objects of interest, a signature is assigned to each text fragment. The signature is the value for that text fragment of a text fragment representation comprising at least one text fragment attribute. A caption signature is identified as a signature assigned to a substantial number of text fragments that are near at least one object of interest in the document. One or more captions are detected as one or more text fragments each assigned a caption signature.
摘要翻译：要检测包含文本片段和感兴趣对象的文档中的标题，将为每个文本片段分配一个签名。签名是包含至少一个文本片段属性的文本片段表示的文本片段的值。字幕签名被识别为分配给文档中至少一个感兴趣对象附近的大量文本片段的签名。一个或多个标题被检测为一个或多个文本片段，每个文本片段分配了字幕签名。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式