会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 22. 发明授权
    • User modification of generative model for determining topics and sentiments
    • 用户修改用于确定主题和情绪的生成模型
    • US09015035B2
    • 2015-04-21
    • US13545363
    • 2012-07-10
    • Divna DjordjevicRayid GhaniMarko Krema
    • Divna DjordjevicRayid GhaniMarko Krema
    • G06F17/21G06F17/30G06F17/27G06Q10/00
    • G06F17/2765G06Q10/00
    • A generative model is used to develop at least one topic model and at least one sentiment model for a body of text. The at least one topic model is displayed such that, in response, a user may provide user input indicating modifications to the at least one topic model. Based on the received user input, the generative model is used to provide at least one updated topic model and at least one updated sentiment model based on the user input. Thereafter, the at least one updated topic model may again be displayed in order to solicit further user input, which further input is then used to once again update the models. The at least one updated topic model and the at least one updated sentiment model may be employed to analyze target text in order to identify topics and associated sentiments therein.
    • 生成模型用于开发至少一个主题模型和至少一个文本主体的情绪模型。 显示至少一个主题模型,使得作为响应,用户可以提供指示对至少一个主题模型的修改的用户输入。 基于所接收的用户输入,生成模型用于基于用户输入提供至少一个更新的主题模型和至少一个更新的情绪模型。 此后,可以再次显示至少一个更新的主题模型以便进一步用户输入,然后进一步输入再次更新模型。 可以使用至少一个更新的主题模型和至少一个更新的情绪模型来分析目标文本,以便识别其中的主题和相关联的情绪。
    • 23. 发明授权
    • Determination of a basis for a new domain model based on a plurality of learned models
    • 基于多个学习模型确定新域模型的基础
    • US08620837B2
    • 2013-12-31
    • US13179741
    • 2011-07-11
    • Rayid GhaniMarko Krema
    • Rayid GhaniMarko Krema
    • G06F15/18
    • G06N99/005
    • In a machine learning system in which a plurality of learned models, each corresponding to a unique domain, already exist, new domain input for training a new domain model may be provided. Statistical characteristics of features in the new domain input are first determined. The resulting new domain statistical characteristics are then compared with statistical characteristics of features in prior input previously provided for training at least some of the plurality of learned models. Thereafter, at least one learned model of the plurality of learned models is identified as the basis for the new domain model when the new domain input statistical characteristics compare favorably with the statistical characteristics of the features in the prior input corresponding to the at least one learned model.
    • 在其中存在多个学习模型(每个对应于唯一域)的机器学习系统中,可以提供用于训练新域模型的新域输入。 首先确定新域输入中特征的统计特征。 然后将所得到的新的域统计特征与先前提供用于训练多个学习模型中的至少一些的先前输入中的特征的统计特征进行比较。 此后,当新的域输入统计特性与先前输入中对应于至少一个学习的特征的统计特征相比较时,多个学习模型的至少一个学习模型被识别为新域模型的基础 模型。
    • 24. 发明授权
    • Preprocessing of text
    • 文字预处理
    • US08620836B2
    • 2013-12-31
    • US12987469
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F15/18
    • G06F17/27
    • Performance of statistical machine learning techniques, particularly classification techniques applied to the extraction of attributes and values concerning products, is improved by preprocessing a body of text to be analyzed to remove extraneous information. The body of text is split into a plurality of segments. In an embodiment, sentence identification criteria are applied to identify sentences as the plurality of segments. Thereafter, the plurality of segments are clustered to provide a plurality of clusters. One or more of the resulting clusters are then analyzed to identify segments having low relevance to their respective clusters. Such low relevance segments are then removed from their respective clusters and, consequently, from the body of text. As the resulting relevance-filtered body of text no longer includes portions of the body of text containing mostly extraneous information, the reliability of any subsequent statistical machine learning techniques may be improved.
    • 统计机器学习技术的性能,特别是应用于提取属性和产品价值的分类技术,通过预处理要分析的文本体以消除无关信息得到改进。 文本正文分为多个段。 在一个实施例中,应用句子识别标准来识别句子作为多个段。 此后,将多个段聚类以提供多个簇。 然后分析所得到的一个或多个聚类,以识别与其相应聚类具有低相关性的片段。 然后从相应的集群中移除这样低的相关性段,并因此从文本正文中移除。 由于结果相关性过滤的文本体不再包含主要包含无关信息的文本主体部分,因此可以提高任何后续统计机器学习技术的可靠性。
    • 25. 发明授权
    • Identification of attributes and values using multiple classifiers
    • 使用多个分类器识别属性和值
    • US08504492B2
    • 2013-08-06
    • US12987505
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F15/18
    • G06F17/30705
    • A body of text comprises a plurality of unknown attributes and a plurality of unknown values. A first classification sub-component labels a first portion of the plurality of unknown values as a first set of values, whereas a second classification sub-component labels a portion of the plurality of unknown attributes as a set of attributes and a second portion of the plurality of unknown values as a second set of values. Learning models implemented by the first and second classification subcomponents are updated based on the set of attributes and the first and second set of values. The first classification sub-component implements at least one supervised classification technique, whereas the second classification sub-component implements an unsupervised and/or semi-supervised classification technique. Active learning may be employed to provide at least one of a corrected attribute and/or corrected value that may be used to update the learning models.
    • 文本主体包括多个未知属性和多个未知值。 第一分类子组件将多个未知值的第一部分标记为第一组值,而第二分类子组件将多个未知属性的一部分标记为一组属性,并且第二部分 多个未知值作为第二组值。 基于属性集合和第一和第二组值更新由第一和第二分类子组件实现的学习模型。 第一分类子组件实现至少一种监督分类技术,而第二分类子组件实现无监督和/或半监督分类技术。 可以采用主动学习来提供可用于更新学习模型的校正的属性和/或校正值中的至少一个。
    • 26. 发明申请
    • SENTIMENT CLASSIFIERS BASED ON FEATURE EXTRACTION
    • 基于特征提取的感知分类器
    • US20130018824A1
    • 2013-01-17
    • US13179707
    • 2011-07-11
    • Rayid GhaniMarko Krema
    • Rayid GhaniMarko Krema
    • G06F15/18
    • G06N99/005
    • Method and apparatus are provided for providing one or more sentiment classifiers from training data using supervised classification techniques based on features extracted from the training data. Training data includes a plurality of units such as, but not limited to, documents, paragraphs, sentences, and clauses. A feature extraction component extracts a plurality of features from the training data, and a feature value determination component determines a value for each extracted feature based on a frequency at which each feature occurs in the training data. On the other hand, a class labeling component labels each unit of the training data according to a plurality of sentiment classes to provide labeled training data. Thereafter, a sentiment classifier generation component provides a least one sentiment classifier based on the value of each extracted feature and the labeled training data using a supervised classification technique.
    • 提供方法和装置,用于基于从训练数据提取的特征,使用监督分类技术从训练数据提供一个或多个情绪分类器。 训练数据包括多个单位,例如但不限于文件,段落,句子和子句。 特征提取组件从训练数据中提取多个特征,并且特征值确定组件基于每个特征在训练数据中出现的频率来确定每个提取的特征的值。 另一方面,类别标注组件根据多个情绪类标示训练数据的每个单元,以提供标记的训练数据。 此后,情绪分类器生成组件使用监督分类技术,基于每个提取的特征的值和标记的训练数据提供至少一个情感分类器。
    • 27. 发明授权
    • Determination of a profile of an entity based on product descriptions
    • 根据产品说明确定实体的配置文件
    • US08117199B2
    • 2012-02-14
    • US12710832
    • 2010-02-23
    • Rayid GhaniAndrew E. Fano
    • Rayid GhaniAndrew E. Fano
    • G06F7/00G06F17/30G06F17/00
    • G06Q30/02G06Q30/0204G06Q30/0255G06Q30/0631Y10S707/944
    • Relative to a given product or products, one or more attributes and, for each attribute, a plurality of possible attribute values, are defined. For a given product and attribute, one or more descriptions of the product are obtained and analyzed to determine the correspondence of the description(s), and hence the product itself, to each of the plurality of possible attribute values. In one embodiment, this analysis is based on previously-labeled training data. A knowledge base can be populated with information identifying the products and their correspondence to the plurality of possible attribute values for each attribute. This technique may be used to develop a profile of an entity, which in turn may be used to develop appropriate marketing messages or recommendations for other products.
    • 相对于给定的产品或产品,定义了一个或多个属性,并且对于每个属性,定义了多个可能的属性值。 对于给定的产品和属性,获得和分析产品的一个或多个描述,以确定描述的对应关系,并因此确定产品本身与多个可能属性值中的每一个的对应关系。 在一个实施例中,该分析基于先前标记的训练数据。 可以使用识别产品的信息及其与每个属性的多个可能属性值的对应关系来填充知识库。 该技术可以用于开发实体的简档,其可以用于为其他产品开发适当的营销消息或建议。
    • 28. 发明授权
    • Data anonymization based on guessing anonymity
    • 基于猜测匿名的数据匿名
    • US08627483B2
    • 2014-01-07
    • US12338483
    • 2008-12-18
    • Yaron RachlinKatherine ProbstRayid Ghani
    • Yaron RachlinKatherine ProbstRayid Ghani
    • G06F7/04
    • G06F21/60G06F21/6254
    • Privacy is defined in the context of a guessing game based on the so-called guessing inequality. The privacy of a sanitized record, i.e., guessing anonymity, is defined by the number of guesses an attacker needs to correctly guess an original record used to generate a sanitized record. Using this definition, optimization problems are formulated that optimize a second anonymization parameter (privacy or data distortion) given constraints on a first anonymization parameter (data distortion or privacy, respectively). Optimization is performed across a spectrum of possible values for at least one noise parameter within a noise model. Noise is then generated based on the noise parameter value(s) and applied to the data, which may comprise real and/or categorical data. Prior to anonymization, the data may have identifiers suppressed, whereas outlier data values in the noise perturbed data may be likewise modified to further ensure privacy.
    • 隐私在基于所谓的猜测不等式的猜测游戏的上下文中被定义。 消毒记录的隐私,即猜测匿名,由攻击者需要正确猜测用于生成消毒记录的原始记录的猜测次数来定义。 使用该定义,给出了优化问题,其优化给定第一匿名参数(分别为数据失真或隐私)的约束的第二匿名参数(隐私或数据失真)。 在噪声模型中的至少一个噪声参数的可能值的频谱范围内执行优化。 然后基于噪声参数值产生并施加到数据的噪声,该数据可以包括实际和/或分类数据。 在匿名化之前,数据可以具有被抑制的标识符,而噪声干扰数据中的异常值数据值可以被修改以进一步确保隐私。
    • 30. 发明授权
    • Classification-based redaction in natural language text
    • 自然语言文本中基于分类的编辑
    • US08938386B2
    • 2015-01-20
    • US13048003
    • 2011-03-15
    • Chad CumbyRayid Ghani
    • Chad CumbyRayid Ghani
    • G06F17/20G06F17/27G10L15/18G06F17/00G06F17/21
    • G06F17/2785G06F17/21G06F17/2765
    • When redacting natural language text, a classifier is used to provide a sensitive concept model according to features in natural language text and in which the various classes employed are sensitive concepts reflected in the natural language text. Similarly, the classifier is used to provide an utility concepts model based on utility concepts. Based on these models, and for one or more identified sensitive concept and identified utility concept, at least one feature in the natural language text is identified that implicates the at least one identified sensitive topic more than the at least one identified utility concept. At least some of the features thus identified may be perturbed such that the modified natural language text may be provided as at least one redacted document. In this manner, features are perturbed to maximize classification error for sensitive concepts while simultaneously minimizing classification error in the utility concepts.
    • 当修改自然语言文本时,分类器用于根据自然语言文本中的特征提供敏感概念模型,其中使用的各种类别是反映在自然语言文本中的敏感概念。 类似地,分类器用于提供基于效用概念的实用概念模型。 基于这些模型,并且对于一个或多个识别的敏感概念和识别的实用性概念,识别自然语言文本中至少一个特征,其涉及至少一个所识别的敏感主题超过至少一个所标识的实用概念。 所识别的至少一些特征可能受到干扰,使得修改的自然语言文本可以被提供为至少一个编辑文档。 以这种方式,扰动特征以最大化敏感概念的分类误差,同时最小化效用概念中的分类误差。