专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08010341B2 Adding prototype information into probabilistic models 有权
标题翻译：将原型信息添加到概率模型中
公开(公告)号：US08010341B2
公开(公告)日：2011-08-30
申请号：US11855099
申请日：2007-09-13
申请人： Kannan Achan , Moises Goldszmidt , Lev Ratinov
发明人： Kannan Achan , Moises Goldszmidt , Lev Ratinov
IPC分类号： G06F17/27 , G06F17/20 , G06F15/18 , G10L15/06 , G10L15/14
CPC分类号： G10L15/142 , G06F17/2715 , G06K9/6297
摘要： Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
摘要翻译：公开了将原型信息并入用于自动化信息处理，挖掘和知识发现的概率模型中的机制。这些模型的示例包括隐马尔可夫模型（HMM），潜在狄利克雷分配（LDA）模型等。原型信息将先前的知识注入到这些模型中，从而使它们更准确，有效和高效。例如，在自动化字标识的上下文中，通过为每个可能的标签提供一小组原型字来将附加知识编码到模型中。最终的结果是，给定语料库中的单词被标记，因此在其中被概括，识别，分类，聚类等等。

2. 发明申请

US20090076794A1 ADDING PROTOTYPE INFORMATION INTO PROBABILISTIC MODELS 有权
标题翻译：将原型信息添加到概率模型中
公开(公告)号：US20090076794A1
公开(公告)日：2009-03-19
申请号：US11855099
申请日：2007-09-13
申请人： Kannan Achan , Moises Goldszmidt , Lev Ratinov
发明人： Kannan Achan , Moises Goldszmidt , Lev Ratinov
IPC分类号： G06F17/27 , G10L15/14 , G10L15/18
CPC分类号： G10L15/142 , G06F17/2715 , G06K9/6297
摘要： Mechanisms are disclosed for incorporating prototype information into probabilistic models for automated information processing, mining, and knowledge discovery. Examples of these models include Hidden Markov Models (HMMs), Latent Dirichlet Allocation (LDA) models, and the like. The prototype information injects prior knowledge to such models, thereby rendering them more accurate, effective, and efficient. For instance, in the context of automated word labeling, additional knowledge is encoded into the models by providing a small set of prototypical words for each possible label. The net result is that words in a given corpus are labeled and are therefore in condition to be summarized, identified, classified, clustered, and the like.
摘要翻译：公开了将原型信息并入用于自动化信息处理，挖掘和知识发现的概率模型中的机制。这些模型的示例包括隐马尔可夫模型（HMM），潜在狄利克雷分配（LDA）模型等。原型信息将先前的知识注入到这些模型中，从而使它们更准确，有效和高效。例如，在自动化字标识的上下文中，通过为每个可能的标签提供一小组原型字来将附加知识编码到模型中。最终的结果是，给定语料库中的单词被标记，因此在其中被概括，识别，分类，聚类等等。

3. 发明申请

US20090171956A1 TEXT CATEGORIZATION WITH KNOWLEDGE TRANSFER FROM HETEROGENEOUS DATASETS 有权
标题翻译：来自异质数据库的知识转移的文本分类
公开(公告)号：US20090171956A1
公开(公告)日：2009-07-02
申请号：US12249809
申请日：2008-10-10
申请人： Rakesh Gupta , Lev Ratinov
发明人： Rakesh Gupta , Lev Ratinov
IPC分类号： G06F17/30
CPC分类号： G06F17/30705
摘要： The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification, a plurality of heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. A plurality of features are extracted from each of the plurality of heterogeneous auxiliary datasets. The plurality of features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.
摘要翻译：本发明提供了一种将异构辅助数据集中的特征结合到用于分类的输入文本数据中的方法，在接收输入文本数据之后访问多个异构辅助数据集，例如标记数据集和未标记的数据集。从多个异构辅助数据集中的每一个提取多个特征。多个特征与输入文本数据组合以产生可能用于对输入文本数据进行分类的一组特征。然后从特征集中提取分类特征，并用于对输入文本数据进行分类。在一个实施例中，通过计算与特征集合中的每个特征相关联的互信息值并且识别具有超过阈值的互信息值的特征来提取分类特征。

4. 发明授权

US08473491B1 Systems and methods of detecting keyword-stuffed business titles 有权
标题翻译：检测关键字商标的系统和方法
公开(公告)号：US08473491B1
公开(公告)日：2013-06-25
申请号：US12959783
申请日：2010-12-03
申请人： Baris Yuksel , Lev Ratinov
发明人： Baris Yuksel , Lev Ratinov
IPC分类号： G06F17/00
CPC分类号： G06Q30/0185 , G06F17/30528 , G06F17/30864 , G06F2221/2101 , G06Q10/06
摘要： The present invention relates generally to identifying fraudulent businesses and business listings. More specifically, the invention relates to determining a “surprisingness” value for a particular combination of words in a business title based on the likelihood that the combination has appeared in legitimate business titles. The value may be used to determine whether the business or business listing is legitimate or fraudulent. For example, third party hijackers may “keyword-stuff” business titles or attempt to include words associated with prominent businesses in a title of a less prominent business associated with the third party in order to have the less prominent business displayed more often in search results for the prominent business. For example, if a business title has too many surprising word combinations or a particular combination is highly unlikely, the business listing is likely to be fraudulent or “keyword-stuffed” and may be withheld, excluded, removed from search results.
摘要翻译：本发明一般涉及识别欺诈性商业和商业列表。更具体地，本发明涉及基于合并出现在合法商业标题中的可能性来确定商业标题中的特定字词组合的“令人惊奇”值。该价值可用于确定商业或商业上市是否合法或欺诈。例如，第三方劫机者可能会“关键字填写”商业标题，或尝试将与着名企业相关联的词包含在与第三方相关联的较不着名业务的标题中，以使较不着名的业务在搜索结果中更频繁地显示为突出业务。例如，如果商家名称具有太多令人惊讶的单词组合，或者特定组合是非常不可能的，则商家列表可能是欺诈性的或“关键字填充”，并且可以被禁止，排除，从搜索结果中移除。

5. 发明授权

US08103671B2 Text categorization with knowledge transfer from heterogeneous datasets 有权
标题翻译：文本分类与异构数据集的知识转移
公开(公告)号：US08103671B2
公开(公告)日：2012-01-24
申请号：US12249809
申请日：2008-10-10
申请人： Rakesh Gupta , Lev Ratinov
发明人： Rakesh Gupta , Lev Ratinov
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/30705
摘要： The present invention provides a method for incorporating features from heterogeneous auxiliary datasets into input text data for use in classification. Heterogeneous auxiliary datasets, such as labeled datasets and unlabeled datasets, are accessed after receiving input text data. Features are extracted from each of the heterogeneous auxiliary datasets. The features are combined with the input text data to generate a set of features which may potentially be used to classify the input text data. Classification features are then extracted from the set of features and used to classify the input text data. In one embodiment, the classification features are extracted by calculating a mutual information value associated with each feature in the set of features and identifying features having a mutual information value exceeding a threshold value.
摘要翻译：本发明提供了一种将来自异构辅助数据集的特征结合到用于分类的输入文本数据中的方法。在接收到输入的文本数据之后，访问异构辅助数据集，如标记的数据集和未标记的数据集。从各种异构辅助数据集中提取特征。这些特征与输入文本数据组合以产生可能用于对输入文本数据进行分类的一组特征。然后从特征集中提取分类特征，并用于对输入文本数据进行分类。在一个实施例中，通过计算与特征集合中的每个特征相关联的互信息值并且识别具有超过阈值的互信息值的特征来提取分类特征。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式