会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 11. 发明授权
    • Interactive cleaning for automatic document clustering and categorization
    • 自动文档聚类和分类的交互式清理
    • US07711747B2
    • 2010-05-04
    • US11784321
    • 2007-04-06
    • Jean-Michel RendersCaroline PrivaultLudovic Menuge
    • Jean-Michel RendersCaroline PrivaultLudovic Menuge
    • G06F17/30
    • G06F17/3071Y10S707/953
    • Documents are clustered or categorized to generate a model associating documents with classes. Outlier measures are computed for the documents indicative of how well each document fits into the model. Outlier documents are identified to a user based on the outlier measures and a user selected outlier criterion. Ambiguity measures are computed for the documents indicative of a number of classes with which each document has similarity under the model. If a document is annotated with a label class, a possible corrective label class is identified if the annotated document has higher similarity with the possible corrective label class under the model than with the annotated label class. The clustering or categorizing is repeated adjusted based on received user input to generate an updated model associating documents with classes. Outlier and ambiguity measures are also calculated at runtime for new documents classified using the model.
    • 文档被聚类或分类以生成将文档与类相关联的模型。 对于表示每个文档适合模型的程度的文档,计算异常值度量。 基于异常值测量和用户选择的异常值准则,向用户识别异常值文档。 对于指示每个文档在模型下具有相似性的类的数量的文档,计算模糊度度量。 如果使用标签类注释文档,则如果注释文档与模型下可能的校正标签类具有与标注标签类更高的相似性,则可以识别可能的校正标签类。 基于接收到的用户输入重复调整聚类或分类,以生成将文档与类相关联的更新模型。 在运行时还使用模型分类的新文档计算异常值和模糊度度量。
    • 12. 发明授权
    • Categorizer with user-controllable calibration
    • 具有用户可控校准的分类器
    • US08189930B2
    • 2012-05-29
    • US12174721
    • 2008-07-17
    • Jean-Michel RendersCaroline PrivaultEric H. Cheminot
    • Jean-Michel RendersCaroline PrivaultEric H. Cheminot
    • G06K9/74G06K9/62
    • G06K9/6277
    • A calibrated categorizer comprises: a multi-class categorizer configured to output class probabilities for an input object corresponding to a set of classes; a class probabilities rescaler configured to rescale class probabilities to generate rescaled class probabilities; and a resealing model learner configured to learn calibration parameters for the class probabilities rescaler based on (i) class probabilities output by the multi-class categorizer for a calibration set of class-labeled objects, (ii) confidence measures output by the multi-class categorizer for the calibration set of class-labeled objects, and (iii) class labels of the calibration set of class-labeled objects, the class probabilities rescaler calibrated by the learned calibration parameters defining a calibrated class probabilities rescaler. In a method embodiment, class probabilities are generated for an input object corresponding to a set of classes using a classifier trained on a first set of objects, and are rescaled to form rescaled class probabilities using a resealing algorithm calibrated using a second set of objects different from the first set of objects. The method may further entail thresholding the rescaled class probabilities using thresholds calibrated using the second set of objects or a third set of objects.
    • 校准分类器包括:多类分类器,被配置为输出与一组类对应的输入对象的类概率; 类概率重定标器被配置为重新缩放类概率以产生重新缩放的类概率; 以及重新密封的模型学习者,其被配置为基于(i)由多类分类器输出的用于类标记对象的校准集的类概率来学习类概率重定标器的校准参数,(ii)由多类输出的置信度度量 分类器,用于类标记对象的校准集,以及(iii)类标记对象的校准集的类标签,通过定义校准的类概率重定标器的所学习的校准参数校准的类概率重新计数器。 在方法实施例中,针对与使用在第一组对象上训练的分类器相对应的类的集合的输入对象生成类概率,并且使用使用不同对象的第二组对象校准的重新密码算法重新缩放以形成重新缩放的类概率 从第一组对象。 该方法还可能使用使用第二组对象或第三组对象校准的阈值来限定重新归类的类概率。
    • 13. 发明申请
    • SYSTEM AND METHOD FOR ASSISTED DOCUMENT REVIEW
    • 用于辅助文件审查的系统和方法
    • US20100312725A1
    • 2010-12-09
    • US12479972
    • 2009-06-08
    • Caroline PRIVAULTJacki O'NeillJean-Michel RendersVictor CirizaYves Hoppenot
    • Caroline PRIVAULTJacki O'NeillJean-Michel RendersVictor CirizaYves Hoppenot
    • G06F15/18G06N5/02
    • G06N5/043G06Q10/10G06Q50/18
    • A system and method for reviewing documents are provided. A collection of documents is portioned into sets of documents for review by a plurality of reviewers. For each set, documents in the set are displayed on a display device for review by a reviewer and temporarily organized through grouping and sorting. The reviewer's labels for the displayed documents are received. Based on the reviewer's labels, a class from a plurality of classes is assigned to each of the reviewed documents. A classifier model stored in computer memory is progressively trained, based on features extracted from the reviewed documents in the set and their assigned classes. Prior to review of all documents in the set, a calculated subset of documents for which the classifier model assigns a class different from the one assigned based on the reviewer's label is returned for a second review by a reviewer. Models generated from one or more other document sets can be used to assess the review of a first of the sets.
    • 提供了一种审查文件的系统和方法。 一组文件分为多组文件供多位评审员审阅。 对于每个集合,集合中的文档显示在显示设备上,供审阅者查看,并通过分组和排序进行临时组织。 接收到显示文件的审阅者标签。 根据审阅者的标签,将来自多个类的课程分配给每个经审查的文档。 存储在计算机存储器中的分类器模型基于从集合中的经审查的文档及其分配的类中提取的特征而逐渐训练。 在审查集合中的所有文档之前,返回分类器模型分配与基于审阅者标签分配的类别不同的​​类别的文档的计算子集,供审阅者进行第二次审阅。 可以使用从一个或多个其他文档集生成的模型来评估第一组的审查。
    • 14. 发明申请
    • Interactive cleaning for automatic document clustering and categorization
    • 自动文档聚类和分类的交互式清理
    • US20080249999A1
    • 2008-10-09
    • US11784321
    • 2007-04-06
    • Jean-Michel RendersCaroline PrivaultLudovic Menuge
    • Jean-Michel RendersCaroline PrivaultLudovic Menuge
    • G06F17/30
    • G06F17/3071Y10S707/953
    • Documents are clustered or categorized to generate a model associating documents with classes. Outlier measures are computed for the documents indicative of how well each document fits into the model. Outlier documents are identified to a user based on the outlier measures and a user selected outlier criterion. Ambiguity measures are computed for the documents indicative of a number of classes with which each document has similarity under the model. If a document is annotated with a label class, a possible corrective label class is identified if the annotated document has higher similarity with the possible corrective label class under the model than with the annotated label class. The clustering or categorizing is repeated adjusted based on received user input to generate an updated model associating documents with classes. Outlier and. ambiguity measures are also calculated at runtime for new documents classified using the model.
    • 文档被聚类或分类以生成将文档与类相关联的模型。 对于表示每个文档适合模型的程度的文档,计算异常值度量。 基于异常值测量和用户选择的异常值准则,向用户识别异常值文档。 对于指示每个文档在模型下具有相似性的类的数量的文档,计算模糊度度量。 如果使用标签类注释文档,则如果注释文档与模型下可能的校正标签类具有与标注标签类更高的相似性,则可以识别可能的校正标签类。 基于接收到的用户输入重复调整聚类或分类,以生成将文档与类相关联的更新模型。 异常和。 在运行时还会使用模型分类的新文档计算模糊度量度。
    • 18. 发明授权
    • Full and semi-batch clustering
    • 全和半批聚类
    • US08880525B2
    • 2014-11-04
    • US13437079
    • 2012-04-02
    • Matthias GalleJean-Michel Renders
    • Matthias GalleJean-Michel Renders
    • G06F17/30
    • G06F17/30G06F17/30707
    • A method for clustering documents is provided. Each document is represented by a multidimensional data point. The data points are initially assigned to a respective cluster and serve as their initial representative points. Thereafter, in an iterative process, the data points are clustered among the clusters, by assigning the data points to the clusters based on a comparison measure of each data point with the cluster or its representative point, and a threshold of the comparison measure. Based on this clustering, a new representative point for each of the clusters can be computed. Optionally, overlapping clusters are merged. For the next iteration, the new representative points are used as the representative points. An assignment of the documents to the clusters is output, based on a clustering of the data points in the latest iteration. Multiple batches may be processed, retaining the initial clusters to which the original batch was assigned.
    • 提供了一种聚类文档的方法。 每个文档由多维数据点表示。 数据点最初分配给相应的集群,并充当其初始代表点。 此后,在迭代过程中,通过基于与簇或其代表点的每个数据点的比较度量以及比较度量的阈值将数据点分配给群集,将数据点聚类在群集中。 基于此聚类,可以计算出每个簇的新的代表点。 可选地,重叠的聚类被合并。 对于下一次迭代,将使用新的代表点作为代表点。 基于最新迭代中数据点的聚类,输出文档到集群的分配。 可以处理多个批次,保留分配原始批次的初始集群。
    • 19. 发明申请
    • LARGE SCALE UNSUPERVISED HIERARCHICAL DOCUMENT CATEGORIZATION USING ONTOLOGICAL GUIDANCE
    • 使用本体指导的大规模不均匀分类文档分类
    • US20120203752A1
    • 2012-08-09
    • US13022766
    • 2011-02-08
    • Viet Ha-ThucJean-Michel Renders
    • Viet Ha-ThucJean-Michel Renders
    • G06F17/30
    • G06F17/30705
    • A classification method includes constructing queries from category descriptors representing categories of a taxonomy of hierarchically organized categories. The query constructed for a category c includes a query component based on descriptors of the category c and at least one query component based on descriptors of an ancestor or descendant category of the category c. A documents database is queried using the constructed queries to retrieve pseudo-relevant documents. Language models for the categories of the taxonomy are extracted from the pseudo-relevant documents by inferring a hierarchical topic model representing the taxonomy. An input document is classified by optimizing mixture weights of a weighted combination of categories of the hierarchical topic model respective to the input document.
    • 分类方法包括从表示分级组织类别分类的类别的类别描述符构造查询。 为类别c构造的查询包括基于类别c的描述符的查询组件和基于类别c的祖先或后代类别的描述符的至少一个查询组件。 使用构造的查询查询文档数据库以检索伪相关文档。 通过推断表示分类法的分层主题模型,从伪相关文档中提取分类法类别的语言模型。 通过优化与输入文档相对应的分级主题模型的类别的加权组合的混合权重来分类输入文档。