会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Classification-based redaction in natural language text
    • 自然语言文本中基于分类的编辑
    • US08938386B2
    • 2015-01-20
    • US13048003
    • 2011-03-15
    • Chad CumbyRayid Ghani
    • Chad CumbyRayid Ghani
    • G06F17/20G06F17/27G10L15/18G06F17/00G06F17/21
    • G06F17/2785G06F17/21G06F17/2765
    • When redacting natural language text, a classifier is used to provide a sensitive concept model according to features in natural language text and in which the various classes employed are sensitive concepts reflected in the natural language text. Similarly, the classifier is used to provide an utility concepts model based on utility concepts. Based on these models, and for one or more identified sensitive concept and identified utility concept, at least one feature in the natural language text is identified that implicates the at least one identified sensitive topic more than the at least one identified utility concept. At least some of the features thus identified may be perturbed such that the modified natural language text may be provided as at least one redacted document. In this manner, features are perturbed to maximize classification error for sensitive concepts while simultaneously minimizing classification error in the utility concepts.
    • 当修改自然语言文本时,分类器用于根据自然语言文本中的特征提供敏感概念模型,其中使用的各种类别是反映在自然语言文本中的敏感概念。 类似地,分类器用于提供基于效用概念的实用概念模型。 基于这些模型,并且对于一个或多个识别的敏感概念和识别的实用性概念,识别自然语言文本中至少一个特征,其涉及至少一个所识别的敏感主题超过至少一个所标识的实用概念。 所识别的至少一些特征可能受到干扰,使得修改的自然语言文本可以被提供为至少一个编辑文档。 以这种方式,扰动特征以最大化敏感概念的分类误差,同时最小化效用概念中的分类误差。
    • 3. 发明授权
    • Preprocessing of text
    • 文字预处理
    • US08620836B2
    • 2013-12-31
    • US12987469
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F15/18
    • G06F17/27
    • Performance of statistical machine learning techniques, particularly classification techniques applied to the extraction of attributes and values concerning products, is improved by preprocessing a body of text to be analyzed to remove extraneous information. The body of text is split into a plurality of segments. In an embodiment, sentence identification criteria are applied to identify sentences as the plurality of segments. Thereafter, the plurality of segments are clustered to provide a plurality of clusters. One or more of the resulting clusters are then analyzed to identify segments having low relevance to their respective clusters. Such low relevance segments are then removed from their respective clusters and, consequently, from the body of text. As the resulting relevance-filtered body of text no longer includes portions of the body of text containing mostly extraneous information, the reliability of any subsequent statistical machine learning techniques may be improved.
    • 统计机器学习技术的性能,特别是应用于提取属性和产品价值的分类技术,通过预处理要分析的文本体以消除无关信息得到改进。 文本正文分为多个段。 在一个实施例中,应用句子识别标准来识别句子作为多个段。 此后,将多个段聚类以提供多个簇。 然后分析所得到的一个或多个聚类,以识别与其相应聚类具有低相关性的片段。 然后从相应的集群中移除这样低的相关性段,并因此从文本正文中移除。 由于结果相关性过滤的文本体不再包含主要包含无关信息的文本主体部分,因此可以提高任何后续统计机器学习技术的可靠性。
    • 4. 发明授权
    • Identification of attributes and values using multiple classifiers
    • 使用多个分类器识别属性和值
    • US08504492B2
    • 2013-08-06
    • US12987505
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F15/18
    • G06F17/30705
    • A body of text comprises a plurality of unknown attributes and a plurality of unknown values. A first classification sub-component labels a first portion of the plurality of unknown values as a first set of values, whereas a second classification sub-component labels a portion of the plurality of unknown attributes as a set of attributes and a second portion of the plurality of unknown values as a second set of values. Learning models implemented by the first and second classification subcomponents are updated based on the set of attributes and the first and second set of values. The first classification sub-component implements at least one supervised classification technique, whereas the second classification sub-component implements an unsupervised and/or semi-supervised classification technique. Active learning may be employed to provide at least one of a corrected attribute and/or corrected value that may be used to update the learning models.
    • 文本主体包括多个未知属性和多个未知值。 第一分类子组件将多个未知值的第一部分标记为第一组值,而第二分类子组件将多个未知属性的一部分标记为一组属性,并且第二部分 多个未知值作为第二组值。 基于属性集合和第一和第二组值更新由第一和第二分类子组件实现的学习模型。 第一分类子组件实现至少一种监督分类技术,而第二分类子组件实现无监督和/或半监督分类技术。 可以采用主动学习来提供可用于更新学习模型的校正的属性和/或校正值中的至少一个。
    • 5. 发明申请
    • PREPROCESSING OF TEXT
    • 文本预处理
    • US20120179453A1
    • 2012-07-12
    • US12987469
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F17/27
    • G06F17/27
    • Performance of statistical machine learning techniques, particularly classification techniques applied to the extraction of attributes and values concerning products, is improved by preprocessing a body of text to be analyzed to remove extraneous information. The body of text is split into a plurality of segments. In an embodiment, sentence identification criteria are applied to identify sentences as the plurality of segments. Thereafter, the plurality of segments are clustered to provide a plurality of clusters. One or more of the resulting clusters are then analyzed to identify segments having low relevance to their respective clusters. Such low relevance segments are then removed from their respective clusters and, consequently, from the body of text. As the resulting relevance-filtered body of text no longer includes portions of the body of text containing mostly extraneous information, the reliability of any subsequent statistical machine learning techniques may be improved.
    • 统计机器学习技术的性能,特别是应用于提取属性和产品价值的分类技术,通过预处理要分析的文本体以消除无关信息得到改进。 文本正文分为多个段。 在一个实施例中,应用句子识别标准来识别句子作为多个段。 此后,将多个段聚类以提供多个簇。 然后分析所得到的一个或多个聚类,以识别与其相应聚类具有低相关性的片段。 然后从相应的集群中移除这样低的相关性段,并因此从文本正文中移除。 由于结果相关性过滤的文本体不再包含主要包含无关信息的文本主体部分,因此可以提高任何后续统计机器学习技术的可靠性。
    • 6. 发明申请
    • IDENTIFICATION OF ATTRIBUTES AND VALUES USING MULTIPLE CLASSIFIERS
    • 使用多个分类器识别属性和值
    • US20120179633A1
    • 2012-07-12
    • US12987505
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F15/18
    • G06F17/30705
    • A body of text comprises a plurality of unknown attributes and a plurality of unknown values. A first classification sub-component labels a first portion of the plurality of unknown values as a first set of values, whereas a second classification sub-component labels a portion of the plurality of unknown attributes as a set of attributes and a second portion of the plurality of unknown values as a second set of values. Learning models implemented by the first and second classification subcomponents are updated based on the set of attributes and the first and second set of values. The first classification sub-component implements at least one supervised classification technique, whereas the second classification sub-component implements an unsupervised and/or semi-supervised classification technique. Active learning may be employed to provide at least one of a corrected attribute and/or corrected value that may be used to update the learning models.
    • 文本主体包括多个未知属性和多个未知值。 第一分类子组件将多个未知值的第一部分标记为第一组值,而第二分类子组件将多个未知属性的一部分标记为一组属性,并且第二部分 多个未知值作为第二组值。 基于属性集合和第一和第二组值更新由第一和第二分类子组件实现的学习模型。 第一分类子组件实现至少一种监督分类技术,而第二分类子组件实现无监督和/或半监督分类技术。 可以采用主动学习来提供可用于更新学习模型的校正的属性和/或校正值中的至少一个。
    • 8. 发明申请
    • DETERMINATION OF DOCUMENT CREDIBILITY
    • 确定文件可信度
    • US20130054502A1
    • 2013-02-28
    • US13221592
    • 2011-08-30
    • Andrew E. FanoChad CumbyMarko KremaSai P. Kandallu
    • Andrew E. FanoChad CumbyMarko KremaSai P. Kandallu
    • G06N5/02
    • G06N5/02G06F17/30554
    • A plurality of topics encompassed in a document are determined and, for each such topic, a sentiment for that topic is likewise determined. Thereafter, credibility of the document is determined based on the resulting plurality of sentiments. In one embodiment, credibility of at least one target document is established by first determining, for each of a plurality of portions of the at least one target document, at least one topic encompassed in the portion to provide a plurality of target topics. Likewise, sentiment scores are determined for each portion. Thereafter, for each prior topic of a plurality of prior topics, a topic-sentiment score is determined based on sentiment scores corresponding to those portions of the plurality of portions having a target topic corresponding to the prior topic. A credibility index is determined based on the resulting plurality of topic-sentiment scores.
    • 确定文档中包含的多个主题,并且对于每个这样的主题,同样确定该主题的情绪。 此后,基于所得到的多个情绪来确定文档的可信度。 在一个实施例中,通过首先针对至少一个目标文档的多个部分中的每一个来确定包含在该部分中以提供多个目标主题的至少一个主题来建立至少一个目标文档的可信度。 同样,每个部分都确定情绪评分。 此后,对于多个先前主题的每个先前的主题,基于与具有与先前主题相对应的目标主题的多个部分的那些部分相对应的情感评分来确定话题情绪评分。 基于所得到的多个主题情绪评分来确定可信性指数。
    • 9. 发明申请
    • ACTION RECOGNITION AND INTERPRETATION USING A PRECISION POSITIONING SYSTEM
    • 使用精确定位系统的行为识别和解释
    • US20100106453A1
    • 2010-04-29
    • US12683109
    • 2010-01-06
    • Andrew E. FanoChad Cumby
    • Andrew E. FanoChad Cumby
    • G06F15/00G06F17/00
    • G06K9/00771
    • To facilitate the recognition and interpretation of actions undertaken within an environment, the environment is associated with a precision positioning system (PPS) and a controller in communication with the PPS. Within the environment, an entity moves about in furtherance of one or more tasks to be completed within the environment. The PPS determines position data corresponding to at least a portion of the entity, which position data is subsequently compared with at least one known action corresponding to a predetermined task within the environment. Using a state-based task model, recognized actions may be interpreted and used to initiate at least one system action based on the current state of the task model and correspondence of the position data to the at least one known action. In an embodiment, an entity recognition system provides an identity of the entity to determine whether the entity is authorized to perform an action.
    • 为了便于识别和解释环境中采取的行动,环境与精密定位系统(PPS)和与PPS通信的控制器相关联。 在环境中,一个实体围绕着在环境中完成的一个或多个任务。 PPS确定对应于实体的至少一部分的位置数据,该位置数据随后与对应于环境内的预定任务的至少一个已知动作进行比较。 使用基于状态的任务模型,可以解释识别的动作,并且用于基于任务模型的当前状态和位置数据与至少一个已知动作的对应来发起至少一个系统动作。 在实施例中,实体识别系统提供实体的身份,以确定实体是否被授权执行动作。
    • 10. 发明授权
    • Determination of document credibility
    • 确定文件的可信度
    • US08650143B2
    • 2014-02-11
    • US13221592
    • 2011-08-30
    • Andrew E. FanoChad CumbyMarko KremaSai P. Kandallu
    • Andrew E. FanoChad CumbyMarko KremaSai P. Kandallu
    • G06F17/00G06N5/02
    • G06N5/02G06F17/30554
    • A plurality of topics encompassed in a document are determined and, for each such topic, a sentiment for that topic is likewise determined. Thereafter, credibility of the document is determined based on the resulting plurality of sentiments. In one embodiment, credibility of at least one target document is established by first determining, for each of a plurality of portions of the at least one target document, at least one topic encompassed in the portion to provide a plurality of target topics. Likewise, sentiment scores are determined for each portion. Thereafter, for each prior topic of a plurality of prior topics, a topic-sentiment score is determined based on sentiment scores corresponding to those portions of the plurality of portions having a target topic corresponding to the prior topic. A credibility index is determined based on the resulting plurality of topic-sentiment scores.
    • 确定文档中包含的多个主题,并且对于每个这样的主题,同样确定该主题的情绪。 此后,基于所得到的多个情绪来确定文档的可信度。 在一个实施例中,通过首先针对至少一个目标文档的多个部分中的每一个来确定包含在该部分中以提供多个目标主题的至少一个主题来建立至少一个目标文档的可信度。 同样,每个部分都确定情绪评分。 此后,对于多个先前主题的每个先前的主题,基于与具有与先前主题相对应的目标主题的多个部分的那些部分相对应的情感评分来确定话题情绪评分。 基于所得到的多个主题情绪评分来确定可信性指数。