会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Adding prototype information into probabilistic models
    • 将原型信息添加到概率模型中
    • US08010341B2
    • 2011-08-30
    • US11855099
    • 2007-09-13
    • Kannan AchanMoises GoldszmidtLev Ratinov
    • Kannan AchanMoises GoldszmidtLev Ratinov
    • G06F17/27G06F17/20G06F15/18G10L15/06G10L15/14
    • G10L15/142G06F17/2715G06K9/6297
    • Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
    • 公开了将原型信息并入用于自动化信息处理,挖掘和知识发现的概率模型中的机制。 这些模型的示例包括隐马尔可夫模型(HMM),潜在狄利克雷分配(LDA)模型等。 原型信息将先前的知识注入到这些模型中,从而使它们更准确,有效和高效。 例如,在自动化字标识的上下文中,通过为每个可能的标签提供一小组原型字来将附加知识编码到模型中。 最终的结果是,给定语料库中的单词被标记,因此在其中被概括,识别,分类,聚类等等。
    • 2. 发明申请
    • ADDING PROTOTYPE INFORMATION INTO PROBABILISTIC MODELS
    • 将原型信息添加到概率模型中
    • US20090076794A1
    • 2009-03-19
    • US11855099
    • 2007-09-13
    • Kannan AchanMoises GoldszmidtLev Ratinov
    • Kannan AchanMoises GoldszmidtLev Ratinov
    • G06F17/27G10L15/14G10L15/18
    • G10L15/142G06F17/2715G06K9/6297
    • Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
    • 公开了将原型信息并入用于自动化信息处理,挖掘和知识发现的概率模型中的机制。 这些模型的示例包括隐马尔可夫模型(HMM),潜在狄利克雷分配(LDA)模型等。 原型信息将先前的知识注入到这些模型中,从而使它们更准确,有效和高效。 例如,在自动化字标识的上下文中,通过为每个可能的标签提供一小组原型字来将附加知识编码到模型中。 最终的结果是,给定语料库中的单词被标记,因此在其中被概括,识别,分类,聚类等等。
    • 3. 发明申请
    • TEXT CATEGORIZATION WITH KNOWLEDGE TRANSFER FROM HETEROGENEOUS DATASETS
    • 来自异质数据库的知识转移的文本分类
    • US20090171956A1
    • 2009-07-02
    • US12249809
    • 2008-10-10
    • Rakesh GuptaLev Ratinov
    • Rakesh GuptaLev Ratinov
    • G06F17/30
    • G06F17/30705
    • The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification, a plurality of heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. A plurality of features are extracted from each of the plurality of heterogeneous auxiliary datasets. The plurality of features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.
    • 本发明提供了一种将异构辅助数据集中的特征结合到用于分类的输入文本数据中的方法,在接收输入文本数据之后访问多个异构辅助数据集,例如标记数据集和未标记的数据集。 从多个异构辅助数据集中的每一个提取多个特征。 多个特征与输入文本数据组合以产生可能用于对输入文本数据进行分类的一组特征。 然后从特征集中提取分类特征,并用于对输入文本数据进行分类。 在一个实施例中,通过计算与特征集合中的每个特征相关联的互信息值并且识别具有超过阈值的互信息值的特征来提取分类特征。
    • 4. 发明授权
    • Systems and methods of detecting keyword-stuffed business titles
    • 检测关键字商标的系统和方法
    • US08473491B1
    • 2013-06-25
    • US12959783
    • 2010-12-03
    • Baris YukselLev Ratinov
    • Baris YukselLev Ratinov
    • G06F17/00
    • G06Q30/0185G06F17/30528G06F17/30864G06F2221/2101G06Q10/06
    • The present invention relates generally to identifying fraudulent businesses and business listings. More specifically, the invention relates to determining a “surprisingness” value for a particular combination of words in a business title based on the likelihood that the combination has appeared in legitimate business titles. The value may be used to determine whether the business or business listing is legitimate or fraudulent. For example, third party hijackers may “keyword-stuff” business titles or attempt to include words associated with prominent businesses in a title of a less prominent business associated with the third party in order to have the less prominent business displayed more often in search results for the prominent business. For example, if a business title has too many surprising word combinations or a particular combination is highly unlikely, the business listing is likely to be fraudulent or “keyword-stuffed” and may be withheld, excluded, removed from search results.
    • 本发明一般涉及识别欺诈性商业和商业列表。 更具体地,本发明涉及基于合并出现在合法商业标题中的可能性来确定商业标题中的特定字词组合的“令人惊奇”值。 该价值可用于确定商业或商业上市是否合法或欺诈。 例如,第三方劫机者可能会“关键字填写”商业标题,或尝试将与着名企业相关联的词包含在与第三方相关联的较不着名业务的标题中,以使较不着名的业务在搜索结果中更频繁地显示 为突出业务。 例如,如果商家名称具有太多令人惊讶的单词组合,或者特定组合是非常不可能的,则商家列表可能是欺诈性的或“关键字填充”,并且可以被禁止,排除,从搜索结果中移除。
    • 5. 发明授权
    • Text categorization with knowledge transfer from heterogeneous datasets
    • 文本分类与异构数据集的知识转移
    • US08103671B2
    • 2012-01-24
    • US12249809
    • 2008-10-10
    • Rakesh GuptaLev Ratinov
    • Rakesh GuptaLev Ratinov
    • G06F7/00G06F17/30
    • G06F17/30705
    • The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification. Heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. Features are extracted from each of the heterogeneous auxiliary datasets. The features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.
    • 本发明提供了一种将来自异构辅助数据集的特征结合到用于分类的输入文本数据中的方法。 在接收到输入的文本数据之后,访问异构辅助数据集,如标记的数据集和未标记的数据集。 从各种异构辅助数据集中提取特征。 这些特征与输入文本数据组合以产生可能用于对输入文本数据进行分类的一组特征。 然后从特征集中提取分类特征,并用于对输入文本数据进行分类。 在一个实施例中,通过计算与特征集合中的每个特征相关联的互信息值并且识别具有超过阈值的互信息值的特征来提取分类特征。