专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

11. 发明授权

US08189930B2 Categorizer with user-controllable calibration 有权
标题翻译：具有用户可控校准的分类器
公开(公告)号：US08189930B2
公开(公告)日：2012-05-29
申请号：US12174721
申请日：2008-07-17
申请人： Jean-Michel Renders , Caroline Privault , Eric H. Cheminot
发明人： Jean-Michel Renders , Caroline Privault , Eric H. Cheminot
IPC分类号： G06K9/74 , G06K9/62
CPC分类号： G06K9/6277
摘要： A calibrated categorizer comprises: a multi-class categorizer configured to output class probabilities for an input object corresponding to a set of classes; a class probabilities rescaler configured to rescale class probabilities to generate rescaled class probabilities; and a resealing model learner configured to learn calibration parameters for the class probabilities rescaler based on (i) class probabilities output by the multi-class categorizer for a calibration set of class-labeled objects, (ii) confidence measures output by the multi-class categorizer for the calibration set of class-labeled objects, and (iii) class labels of the calibration set of class-labeled objects, the class probabilities rescaler calibrated by the learned calibration parameters defining a calibrated class probabilities rescaler. In a method embodiment, class probabilities are generated for an input object corresponding to a set of classes using a classifier trained on a first set of objects, and are rescaled to form rescaled class probabilities using a resealing algorithm calibrated using a second set of objects different from the first set of objects. The method may further entail thresholding the rescaled class probabilities using thresholds calibrated using the second set of objects or a third set of objects.
摘要翻译：校准分类器包括：多类分类器，被配置为输出与一组类对应的输入对象的类概率; 类概率重定标器被配置为重新缩放类概率以产生重新缩放的类概率; 以及重新密封的模型学习者，其被配置为基于（i）由多类分类器输出的用于类标记对象的校准集的类概率来学习类概率重定标器的校准参数，（ii）由多类输出的置信度度量分类器，用于类标记对象的校准集，以及（iii）类标记对象的校准集的类标签，通过定义校准的类概率重定标器的所学习的校准参数校准的类概率重新计数器。在方法实施例中，针对与使用在第一组对象上训练的分类器相对应的类的集合的输入对象生成类概率，并且使用使用不同对象的第二组对象校准的重新密码算法重新缩放以形成重新缩放的类概率从第一组对象。该方法还可能使用使用第二组对象或第三组对象校准的阈值来限定重新归类的类概率。

12. 发明申请

US20080249999A1 Interactive cleaning for automatic document clustering and categorization 有权
标题翻译：自动文档聚类和分类的交互式清理
公开(公告)号：US20080249999A1
公开(公告)日：2008-10-09
申请号：US11784321
申请日：2007-04-06
申请人： Jean-Michel Renders , Caroline Privault , Ludovic Menuge
发明人： Jean-Michel Renders , Caroline Privault , Ludovic Menuge
IPC分类号： G06F17/30
CPC分类号： G06F17/3071 , Y10S707/953
摘要： Documents are clustered or categorized to generate a model associating documents with classes. Outlier measures are computed for the documents indicative of how well each document fits into the model. Outlier documents are identified to a user based on the outlier measures and a user selected outlier criterion. Ambiguity measures are computed for the documents indicative of a number of classes with which each document has similarity under the model. If a document is annotated with a label class, a possible corrective label class is identified if the annotated document has higher similarity with the possible corrective label class under the model than with the annotated label class. The clustering or categorizing is repeated adjusted based on received user input to generate an updated model associating documents with classes. Outlier and. ambiguity measures are also calculated at runtime for new documents classified using the model.
摘要翻译：文档被聚类或分类以生成将文档与类相关联的模型。对于表示每个文档适合模型的程度的文档，计算异常值度量。基于异常值测量和用户选择的异常值准则，向用户识别异常值文档。对于指示每个文档在模型下具有相似性的类的数量的文档，计算模糊度度量。如果使用标签类注释文档，则如果注释文档与模型下可能的校正标签类具有与标注标签类更高的相似性，则可以识别可能的校正标签类。基于接收到的用户输入重复调整聚类或分类，以生成将文档与类相关联的更新模型。异常和。在运行时还会使用模型分类的新文档计算模糊度量度。

13. 发明授权

US09189473B2 System and method for resolving entity coreference 有权
标题翻译：解决实体协同的系统和方法
公开(公告)号：US09189473B2
公开(公告)日：2015-11-17
申请号：US13475250
申请日：2012-05-18
申请人： Matthias Gallé , Jean-Michel Renders , Guillaume Jacquet
发明人： Matthias Gallé , Jean-Michel Renders , Guillaume Jacquet
IPC分类号： G06F17/30 , G06F17/27
CPC分类号： G06F17/278 , G06F17/2795
摘要： A method and a system for coreference resolution are provided. The method includes receiving a set of document clusters, each cluster in the set of document clusters including a set of text documents. Instances of each of a set of candidate named entities are identified in the document clusters. For a pairs of the candidate named entities, at least one socio-temporal feature is computed that is based on the similarity of the distributions of identified instances of the respective candidate name entities among the document clusters. A decision for merging for the candidate named entities into a common real named entity is based on the socio-temporal features.
摘要翻译：提供了一种解决方案和系统。该方法包括接收一组文档集群，每组文档集群中的每个集群包括一组文本文档。一组候选命名实体中的每一个的实例在文档集群中被识别。对于一对候选命名实体，计算至少一个社会时间特征，其基于文档簇中相应候选名称实体的所识别实例的分布的相似性。将候选名称实体合并为一个共同的真实命名实体的决定是基于社会时间特征。

14. 发明授权

US08892562B2 Categorization of multi-page documents by anisotropic diffusion 有权
标题翻译：通过各向异性扩散分类多页文档
公开(公告)号：US08892562B2
公开(公告)日：2014-11-18
申请号：US13558814
申请日：2012-07-26
申请人： Jean-Michel Renders , François Ragnet , Damien Cramet
发明人： Jean-Michel Renders , François Ragnet , Damien Cramet
IPC分类号： G06F17/30
CPC分类号： G06F17/30265 , G06F17/30256 , G06K9/00483
摘要： A computer implemented system and method are provided for refining category scores for pages of a sequence of document pages that potentially includes document boundaries. The method uses initial category scores provided by a categorizer that considers one page at a time or concatenated pairs of pages (called bipages). The category scores represent the probability that a page belongs to a particular category. The method uses anisotropic diffusion to refine the initial page category scores using the scores of neighboring pages as a function of the probability that there is a boundary between the pages. The method may be performed iteratively.
摘要翻译：提供了一种计算机实现的系统和方法，用于对可能包括文档边界的文档页面序列的页面的类别分数进行细化。该方法使用由分类程序提供的初始类别分数，该分类程序一次考虑一个页面或连接的页面对（称为“比较”）。类别分数表示页面属于特定类别的概率。该方法使用各向异性扩散来使用相邻页面的分数来优化初始页面类别分数，作为页面之间存在边界的概率的函数。可以迭代地执行该方法。

15. 发明申请

US20140032558A1 CATEGORIZATION OF MULTI-PAGE DOCUMENTS BY ANISOTROPIC DIFFUSION 有权
标题翻译：通过各向异性扩散分类多页文件
公开(公告)号：US20140032558A1
公开(公告)日：2014-01-30
申请号：US13558814
申请日：2012-07-26
申请人： Jean-Michel Renders , François Ragnet , Damien Cramet
发明人： Jean-Michel Renders , François Ragnet , Damien Cramet
IPC分类号： G06F17/30
CPC分类号： G06F17/30265 , G06F17/30256 , G06K9/00483
摘要： A computer implemented system and method are provided for refining category scores for pages of a sequence of document pages that potentially includes document boundaries. The method uses initial category scores provided by a categorizer that considers one page at a time or concatenated pairs of pages (called bipages). The category scores represent the probability that a page belongs to a particular category. The method uses anisotropic diffusion to refine the initial page category scores using the scores of neighboring pages as a function of the probability that there is a boundary between the pages. The method may be performed iteratively.
摘要翻译：提供了一种计算机实现的系统和方法，用于对可能包括文档边界的文档页面序列的页面的类别分数进行细化。该方法使用由分类程序提供的初始类别分数，该分类程序一次考虑一个页面或连接的页面对（称为“比较”）。类别分数表示页面属于特定类别的概率。该方法使用各向异性扩散来使用相邻页面的分数来优化初始页面类别分数，作为页面之间存在边界的概率的函数。可以迭代地执行该方法。

16. 发明授权

US07620539B2 Methods and apparatuses for identifying bilingual lexicons in comparable corpora using geometric processing 有权
标题翻译：使用几何加工识别可比语料库中的双语词典的方法和装置
公开(公告)号：US07620539B2
公开(公告)日：2009-11-17
申请号：US10976847
申请日：2004-11-01
申请人： Eric Gaussier , Jean-Michel Renders , Herve Dejean , Cyril Goutte , Irina Matveeva
发明人： Eric Gaussier , Jean-Michel Renders , Herve Dejean , Cyril Goutte , Irina Matveeva
IPC分类号： G06F17/28 , G06F17/20 , G06F17/27 , G06F17/21 , G06F17/30
CPC分类号： G06F17/2785 , G06F17/2735 , G06F17/2827 , Y10S707/99937
摘要： Various methods formulated using a geometric interpretation for identifying bilingual pairs in comparable corpora using a bilingual dictionary are disclosed. The methods may be used separately or in combination to compute the similarity between bilingual pairs.
摘要翻译：公开了使用双语字典使用几何解释来识别可比语料库中的双语对的各种方法。这些方法可以单独使用或组合使用来计算双语对之间的相似度。

17. 发明授权

US08880525B2 Full and semi-batch clustering 有权
标题翻译：全和半批聚类
公开(公告)号：US08880525B2
公开(公告)日：2014-11-04
申请号：US13437079
申请日：2012-04-02
申请人： Matthias Galle , Jean-Michel Renders
发明人： Matthias Galle , Jean-Michel Renders
IPC分类号： G06F17/30
CPC分类号： G06F17/30 , G06F17/30707
摘要： A method for clustering documents is provided. Each document is represented by a multidimensional data point. The data points are initially assigned to a respective cluster and serve as their initial representative points. Thereafter, in an iterative process, the data points are clustered among the clusters, by assigning the data points to the clusters based on a comparison measure of each data point with the cluster or its representative point, and a threshold of the comparison measure. Based on this clustering, a new representative point for each of the clusters can be computed. Optionally, overlapping clusters are merged. For the next iteration, the new representative points are used as the representative points. An assignment of the documents to the clusters is output, based on a clustering of the data points in the latest iteration. Multiple batches may be processed, retaining the initial clusters to which the original batch was assigned.
摘要翻译：提供了一种聚类文档的方法。每个文档由多维数据点表示。数据点最初分配给相应的集群，并充当其初始代表点。此后，在迭代过程中，通过基于与簇或其代表点的每个数据点的比较度量以及比较度量的阈值将数据点分配给群集，将数据点聚类在群集中。基于此聚类，可以计算出每个簇的新的代表点。可选地，重叠的聚类被合并。对于下一次迭代，将使用新的代表点作为代表点。基于最新迭代中数据点的聚类，输出文档到集群的分配。可以处理多个批次，保留分配原始批次的初始集群。

18. 发明申请

US20120203752A1 LARGE SCALE UNSUPERVISED HIERARCHICAL DOCUMENT CATEGORIZATION USING ONTOLOGICAL GUIDANCE 有权
标题翻译：使用本体指导的大规模不均匀分类文档分类
公开(公告)号：US20120203752A1
公开(公告)日：2012-08-09
申请号：US13022766
申请日：2011-02-08
申请人： Viet Ha-Thuc , Jean-Michel Renders
发明人： Viet Ha-Thuc , Jean-Michel Renders
IPC分类号： G06F17/30
CPC分类号： G06F17/30705
摘要： A classification method includes constructing queries from category descriptors representing categories of a taxonomy of hierarchically organized categories. The query constructed for a category c includes a query component based on descriptors of the category c and at least one query component based on descriptors of an ancestor or descendant category of the category c. A documents database is queried using the constructed queries to retrieve pseudo-relevant documents. Language models for the categories of the taxonomy are extracted from the pseudo-relevant documents by inferring a hierarchical topic model representing the taxonomy. An input document is classified by optimizing mixture weights of a weighted combination of categories of the hierarchical topic model respective to the input document.
摘要翻译：分类方法包括从表示分级组织类别分类的类别的类别描述符构造查询。为类别c构造的查询包括基于类别c的描述符的查询组件和基于类别c的祖先或后代类别的描述符的至少一个查询组件。使用构造的查询查询文档数据库以检索伪相关文档。通过推断表示分类法的分层主题模型，从伪相关文档中提取分类法类别的语言模型。通过优化与输入文档相对应的分级主题模型的类别的加权组合的混合权重来分类输入文档。

19. 发明申请

US20100082615A1 CROSS-MEDIA SIMILARITY MEASURES THROUGH TRANS-MEDIA PSEUDO-RELEVANCE FEEDBACK AND DOCUMENT RERANKING 有权
标题翻译：跨媒体相似的措施通过转媒媒体相关的反馈和文件快照
公开(公告)号：US20100082615A1
公开(公告)日：2010-04-01
申请号：US12233978
申请日：2008-09-19
申请人： Stephane Clinchant , Jean-Michel Renders
发明人： Stephane Clinchant , Jean-Michel Renders
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/30256 , G06F17/30265
摘要： A multimedia information retrieval system includes a storage and an electronic processing device. The latter is configured to perform a process including: computing values of a pairwise similarity measure quantifying pairwise similarity of documents of a multimedia reference repository; storing the computed values in the storage; performing an initial information retrieval process respective to the multimedia reference repository to return a set of initial repository documents; and identifying a set of top ranked documents of the multimedia reference repository based at least on the stored computed values pertaining to the set of initial repository documents.
摘要翻译：多媒体信息检索系统包括存储和电子处理装置。后者被配置为执行处理，包括：计算多媒体参考资料库的文档的成对相似性量度的成对相似性度量值; 将计算的值存储在存储器中; 执行相应于多媒体参考资料库的初始信息检索过程以返回一组初始储存库文件; 以及至少基于与所述一组初始储存库文档相关的所存储的计算值来识别所述多媒体参考存储库的一组顶级文档。

20. 发明申请

US20100070521A1 QUERY TRANSLATION THROUGH DICTIONARY ADAPTATION 有权
标题翻译：通过字典适应的QUERY翻译
公开(公告)号：US20100070521A1
公开(公告)日：2010-03-18
申请号：US12233135
申请日：2008-09-18
申请人： Stephane Clinchant , Jean-Michel Renders
发明人： Stephane Clinchant , Jean-Michel Renders
IPC分类号： G06F17/30
CPC分类号： G06F17/30669
摘要： Cross-lingual information retrieval is disclosed, comprising: translating a received query from a source natural language into a target natural language; performing a first information retrieval operation on a corpus of documents in the target natural language using the translated query to retrieve a set of pseudo-feedback documents in the target natural language; re-translating the received query from the source natural language into the target natural language using a translation model derived from the set of pseudo-feedback documents in the target natural language; and performing a second information retrieval operation on the corpus of documents in the target natural language using the re-translated query to retrieve an updated set of documents in the target natural language
摘要翻译：公开了跨语言信息检索，包括：将接收到的查询从源自然语言翻译成目标自然语言; 使用翻译的查询对目标自然语言的文档语料库执行第一信息检索操作以检索目标自然语言的一组伪反馈文档; 使用从所述目标自然语言中的所述伪反馈文档集合导出的翻译模型，将接收到的查询从源自然语言重新翻译成目标自然语言; 以及使用重新翻译的查询对目标自然语言的文档语料库执行第二信息检索操作以检索目标自然语言中更新的一组文档

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式