会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 3. 发明授权
    • Automatic incremental labeling of document clusters
    • 文档集群的自动增量标签
    • US09002848B1
    • 2015-04-07
    • US13530764
    • 2012-06-22
    • Jun PengAner Ben-ArtziKirill BuryakGlenn M. Lewis
    • Jun PengAner Ben-ArtziKirill BuryakGlenn M. Lewis
    • G06F17/30
    • G06F17/30705G06F17/3071
    • Methods and systems for use in labeling documents within a cluster are provided. One example method includes assembling a set of documents including a first plurality of previously clustered documents and a second plurality of documents. Each of the first plurality of previously clustered documents has at least one label identifying a topic to which content of the document relates. The method includes partitioning documents from the set of documents into multiple clusters, determining if a dominant topic exists within one of the multiple clusters, determining a metric value for one of the multiple clusters based on the number of documents within the one of the multiple clusters having a label identifying the determined dominant topic, and labeling at least documents from the second plurality of documents within the one of the multiple clusters with the label identifying the dominant topic when the metric value exceeds a predetermined threshold.
    • 提供了用于在集群内标记文档的方法和系统。 一个示例性方法包括组合一组文档,其包括第一多个先前聚集的文档和第二多个文档。 第一组多个先前聚集的文档中的每一个具有至少一个标签,其标识文档的内容涉及的主题。 该方法包括将文档集合分成多个集群,确定在多个集群中的一个集群内是否存在显性主题,基于多个集群之一内的文档数量来确定多个集群之一的度量值 具有标识所确定的主导主题的标签,以及当所述度量值超过预定阈值时,标识所述主题的标签,至少在所述多个群集中的所述第二多个文档中至少标记文档。
    • 4. 发明授权
    • Contextual text interpretation
    • 语境文本解读
    • US08620918B1
    • 2013-12-31
    • US13364177
    • 2012-02-01
    • Aner Ben-ArtziKirill BuryakGlenn M. LewisJun PengNadav Benbarak
    • Aner Ben-ArtziKirill BuryakGlenn M. LewisJun PengNadav Benbarak
    • G06F7/00G06F17/30
    • G06F17/30707
    • Among other disclosed subject matter, a computer-implemented method includes receiving a plurality of electronic documents associated with a domain at a server. Each of the plurality of electronic documents includes meta-data and textual content. The method includes identifying one or more text strings in the textual content that are to be processed differently than an identical or similar text string in other electronic documents, and associating, with the electronic document, data indicating that each of the identified text strings is to be processed differently than an identical or similar text string in other electronic documents. The method also includes performing an analysis of the electronic documents to identify one or more subsets of the electronic documents that include related subject matter. A plurality of degrees of relatedness can be associated with text strings associated with data indicating that each of the text strings is to be processed differently.
    • 在其他公开的主题中,计算机实现的方法包括在服务器处接收与域相关联的多个电子文档。 多个电子文档中的每一个包括元数据和文本内容。 所述方法包括识别所述文本内容中的待处理的文本字符串与其他电子文档中的相同或相似的文本字符串不同的一个或多个文本串,并且与所述电子文档相关联地指示每个所标识的文本串是 与其他电子文档中的相同或相似的文本字符串的处理方式不同。 该方法还包括执行电子文档的分析以识别包括相关主题的电子文档的一个或多个子集。 多个相关程度可以与与指示每个文本串被不同地处理的数据相关联的文本串相关联。
    • 5. 发明授权
    • Methods and systems for classifying data using a hierarchical taxonomy
    • 使用分层分类法对数据进行分类的方法和系统
    • US09367814B1
    • 2016-06-14
    • US13530505
    • 2012-06-22
    • Glenn M. LewisKirill BuryakAner Ben-ArtziJun PengNadav Benbarak
    • Glenn M. LewisKirill BuryakAner Ben-ArtziJun PengNadav Benbarak
    • G06N99/00
    • G06F17/30598G06F17/30011G06F17/30707G06N5/022G06N7/005G06N99/005G06Q10/00
    • A method and system for classifying documents is provided. A set of document classifiers is generated by applying a classification algorithm to a trusted corpus that includes a set of training documents representing a taxonomy. One or more of the generated document classifiers are executed against a plurality of input documents to create a plurality of classified documents. Each classified document is associated with a classification within the taxonomy and a classification confidence level. One or more classified documents that are associated with a classification confidence level below a predetermined threshold value are selected to create a set of low-confidence documents. The low-confidence documents are disassociated from each of the associated classifications. A user is prompted to enter a classification within the taxonomy for at least one low-confidence document. The low-confidence document is associated with the entered classification and with a predetermined confidence level to create a newly classified document.
    • 提供了一种分类文件的方法和系统。 通过将分类算法应用于包含表示分类法的一组训练文档的受信任语料库来生成一组文档分类器。 针对多个输入文档执行生成的文档分类器中的一个或多个以创建多个分类文档。 每个分类文件与分类法和分类置信水平中的分类相关联。 选择与低于预定阈值的分类置信水平相关联的一个或多个分类文档以创建一组低置信度文档。 低信度文件与每个相关分类分离。 提示用户至少输入一个低信度文档,在分类法中输入分类。 低信度文档与输入的分类相关联,并具有预定的置信水平以创建新分类的文档。
    • 6. 发明授权
    • Clustering internet resources
    • 集群互联网资源
    • US08423551B1
    • 2013-04-16
    • US12940905
    • 2010-11-05
    • Aner Ben-ArtziKirill BuryakGlenn M. LewisJun PengNadav Benbarak
    • Aner Ben-ArtziKirill BuryakGlenn M. LewisJun PengNadav Benbarak
    • G06F17/30
    • G06F17/30867
    • Among other disclosed subject matter, a computer-implemented method includes receiving one or more keywords and identifying a plurality of content items. The content items comprise network content that includes the one or more keywords. The method also includes clustering the plurality of content items and identifying a topic associated with each cluster. The method also includes determining a relative importance of a particular topic and analyzing clusters associated with the particular topic to determine opinion data associated with the particular topic. The method includes preparing a report based on the clusters, relative importance and the opinion data and display the report to a user.
    • 在其他公开的主题中,计算机实现的方法包括接收一个或多个关键字并识别多个内容项。 内容项目包括包括一个或多个关键字的网络内容。 该方法还包括对多个内容项目进行聚类并识别与每个群集相关联的主题。 该方法还包括确定特定主题的相对重要性并分析与特定主题相关联的群集以确定与该特定主题相关联的意见数据。 该方法包括基于集群,相对重要性和意见数据准备报告,并向用户显示报告。
    • 7. 发明授权
    • Clustering internet messages
    • 聚集互联网讯息
    • US08386487B1
    • 2013-02-26
    • US12940917
    • 2010-11-05
    • Aner Ben-ArtziKirill BuryakGlenn M. LewisJun PengNadav Benbarak
    • Aner Ben-ArtziKirill BuryakGlenn M. LewisJun PengNadav Benbarak
    • G06F17/30
    • G06F17/30705G06F17/30997
    • Among other disclosed subject matter, a computer-method includes receiving a plurality of documents at a server and adding meta-data to each of the plurality of documents. The meta-data added to a particular document comprises at least one of task flow features of the particular document or data associated with an author of the particular document. The method also includes selecting a plurality of features for use in clustering the plurality of documents. The plurality of features includes a subset of the meta-data and a subset of content associated with one or more of the plurality of documents. The method also includes clustering the plurality of documents based on the plurality of features including identifying a topic associated with each cluster, and preparing a report based on the clusters and metric information associated with each cluster. The method also includes displaying the report to a user.
    • 在其他公开的主题中,计算机方法包括在服务器处接收多个文档,并将多个文档中的元数据添加到多个文档中。 添加到特定文档的元数据包括与特定文档的作者相关联的特定文档或数据的任务流特征中的至少一个。 该方法还包括选择用于聚类多个文档的多个特征。 多个特征包括元数据的子集和与多个文档中的一个或多个相关联的内容的子集。 该方法还包括基于多个特征聚集多个文档,包括识别与每个聚类相关联的主题,以及基于与每个聚类相关联的聚类和度量信息准备报告。 该方法还包括向用户显示报告。
    • 9. 发明授权
    • Method and system for document classification
    • 文件分类方法和系统
    • US08977620B1
    • 2015-03-10
    • US13531049
    • 2012-06-22
    • Kirill BuryakAner Ben-ArtziGlenn M. LewisJun Peng
    • Kirill BuryakAner Ben-ArtziGlenn M. LewisJun Peng
    • G06F17/30
    • G06F17/30707G06F17/30705G06F17/3071
    • A method and system of classifying documents is provided. The method includes receiving a plurality of documents from at least one user, wherein each document includes information relating to a customer support issue or sentiment and identifying at least one customer support issue or sentiment contained within each document. The method also includes classifying the documents satisfying a confidence threshold using a classifier, clustering the remainder of the plurality of documents into groups using a clustering engine, the clustering engine applying a word analysis, and outputting a frequency of each identified customer support issue or sentiment, the frequency based on the classifying or the clustering.
    • 提供了一种分类文件的方法和系统。 该方法包括从至少一个用户接收多个文档,其中每个文档包括与客户支持问题或情绪相关的信息,并且识别每个文档中包含的至少一个客户支持问题或情绪。 该方法还包括使用分类器对满足置信阈值的文档进行分类,使用聚类引擎将多个文档的其余部分聚类成群组,聚类引擎应用单词分析,以及输出每个识别的客户支持问题或情绪的频率 ,频率基于分类或聚类。
    • 10. 发明授权
    • Methods and systems for organizing content
    • 组织内容的方法和系统
    • US08972404B1
    • 2015-03-03
    • US13531081
    • 2012-06-22
    • Glenn M. LewisKirill BuryakAner Ben-ArtziJun PengNadav Benbarak
    • Glenn M. LewisKirill BuryakAner Ben-ArtziJun PengNadav Benbarak
    • G06F17/30
    • G06F17/30705G06F17/3071
    • A computer-implemented method executes instructions stored on a computer-readable medium. The method includes accessing a hierarchy of clusters, wherein each cluster includes at least one content file, and a label is associated with each cluster. The method further includes calculating a topic purity score for each cluster, and selecting a first cluster and a second cluster from the hierarchy of clusters, wherein the topic purity score of the first cluster and the second cluster are less than a purity threshold. The method also includes creating a third cluster by combining the content files included within the first cluster and the second cluster, determining a parent category of the first cluster and the second cluster, wherein the parent category is at a level within the hierarchy higher than a level of the first cluster and the second cluster, and associating a label of the parent category with the third cluster.
    • 计算机实现的方法执行存储在计算机可读介质上的指令。 该方法包括访问集群的层次结构,其中每个集群包括至少一个内容文件,并且标签与每个集群相关联。 该方法还包括计算每个群集的主题纯度分数,以及从群集层级中选择第一群集和第二群集,其中第一群集和第二群集的主题纯度得分小于纯度阈值。 该方法还包括通过组合包括在第一集群和第二集群内的内容文件来创建第三集群,确定第一集群和第二集群的父类别,其中,父类别处于层级以内的级别,高于 级别的第一个群集和第二个群集,并将父类别的标签与第三个群集相关联。