会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Preprocessing of text
    • 文字预处理
    • US08620836B2
    • 2013-12-31
    • US12987469
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F15/18
    • G06F17/27
    • Performance of statistical machine learning techniques, particularly classification techniques applied to the extraction of attributes and values concerning products, is improved by preprocessing a body of text to be analyzed to remove extraneous information. The body of text is split into a plurality of segments. In an embodiment, sentence identification criteria are applied to identify sentences as the plurality of segments. Thereafter, the plurality of segments are clustered to provide a plurality of clusters. One or more of the resulting clusters are then analyzed to identify segments having low relevance to their respective clusters. Such low relevance segments are then removed from their respective clusters and, consequently, from the body of text. As the resulting relevance-filtered body of text no longer includes portions of the body of text containing mostly extraneous information, the reliability of any subsequent statistical machine learning techniques may be improved.
    • 统计机器学习技术的性能,特别是应用于提取属性和产品价值的分类技术,通过预处理要分析的文本体以消除无关信息得到改进。 文本正文分为多个段。 在一个实施例中,应用句子识别标准来识别句子作为多个段。 此后,将多个段聚类以提供多个簇。 然后分析所得到的一个或多个聚类,以识别与其相应聚类具有低相关性的片段。 然后从相应的集群中移除这样低的相关性段,并因此从文本正文中移除。 由于结果相关性过滤的文本体不再包含主要包含无关信息的文本主体部分,因此可以提高任何后续统计机器学习技术的可靠性。
    • 3. 发明授权
    • Identification of attributes and values using multiple classifiers
    • 使用多个分类器识别属性和值
    • US08504492B2
    • 2013-08-06
    • US12987505
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F15/18
    • G06F17/30705
    • A body of text comprises a plurality of unknown attributes and a plurality of unknown values. A first classification sub-component labels a first portion of the plurality of unknown values as a first set of values, whereas a second classification sub-component labels a portion of the plurality of unknown attributes as a set of attributes and a second portion of the plurality of unknown values as a second set of values. Learning models implemented by the first and second classification subcomponents are updated based on the set of attributes and the first and second set of values. The first classification sub-component implements at least one supervised classification technique, whereas the second classification sub-component implements an unsupervised and/or semi-supervised classification technique. Active learning may be employed to provide at least one of a corrected attribute and/or corrected value that may be used to update the learning models.
    • 文本主体包括多个未知属性和多个未知值。 第一分类子组件将多个未知值的第一部分标记为第一组值,而第二分类子组件将多个未知属性的一部分标记为一组属性,并且第二部分 多个未知值作为第二组值。 基于属性集合和第一和第二组值更新由第一和第二分类子组件实现的学习模型。 第一分类子组件实现至少一种监督分类技术,而第二分类子组件实现无监督和/或半监督分类技术。 可以采用主动学习来提供可用于更新学习模型的校正的属性和/或校正值中的至少一个。
    • 4. 发明申请
    • IDENTIFICATION OF ATTRIBUTES AND VALUES USING MULTIPLE CLASSIFIERS
    • 使用多个分类器识别属性和值
    • US20120179633A1
    • 2012-07-12
    • US12987505
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F15/18
    • G06F17/30705
    • A body of text comprises a plurality of unknown attributes and a plurality of unknown values. A first classification sub-component labels a first portion of the plurality of unknown values as a first set of values, whereas a second classification sub-component labels a portion of the plurality of unknown attributes as a set of attributes and a second portion of the plurality of unknown values as a second set of values. Learning models implemented by the first and second classification subcomponents are updated based on the set of attributes and the first and second set of values. The first classification sub-component implements at least one supervised classification technique, whereas the second classification sub-component implements an unsupervised and/or semi-supervised classification technique. Active learning may be employed to provide at least one of a corrected attribute and/or corrected value that may be used to update the learning models.
    • 文本主体包括多个未知属性和多个未知值。 第一分类子组件将多个未知值的第一部分标记为第一组值,而第二分类子组件将多个未知属性的一部分标记为一组属性,并且第二部分 多个未知值作为第二组值。 基于属性集合和第一和第二组值更新由第一和第二分类子组件实现的学习模型。 第一分类子组件实现至少一种监督分类技术,而第二分类子组件实现无监督和/或半监督分类技术。 可以采用主动学习来提供可用于更新学习模型的校正的属性和/或校正值中的至少一个。
    • 6. 发明申请
    • PREPROCESSING OF TEXT
    • 文本预处理
    • US20120179453A1
    • 2012-07-12
    • US12987469
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F17/27
    • G06F17/27
    • Performance of statistical machine learning techniques, particularly classification techniques applied to the extraction of attributes and values concerning products, is improved by preprocessing a body of text to be analyzed to remove extraneous information. The body of text is split into a plurality of segments. In an embodiment, sentence identification criteria are applied to identify sentences as the plurality of segments. Thereafter, the plurality of segments are clustered to provide a plurality of clusters. One or more of the resulting clusters are then analyzed to identify segments having low relevance to their respective clusters. Such low relevance segments are then removed from their respective clusters and, consequently, from the body of text. As the resulting relevance-filtered body of text no longer includes portions of the body of text containing mostly extraneous information, the reliability of any subsequent statistical machine learning techniques may be improved.
    • 统计机器学习技术的性能,特别是应用于提取属性和产品价值的分类技术,通过预处理要分析的文本体以消除无关信息得到改进。 文本正文分为多个段。 在一个实施例中,应用句子识别标准来识别句子作为多个段。 此后,将多个段聚类以提供多个簇。 然后分析所得到的一个或多个聚类,以识别与其相应聚类具有低相关性的片段。 然后从相应的集群中移除这样低的相关性段,并因此从文本正文中移除。 由于结果相关性过滤的文本体不再包含主要包含无关信息的文本主体部分,因此可以提高任何后续统计机器学习技术的可靠性。
    • 8. 发明申请
    • DETERMINATION OF A BASIS FOR A NEW DOMAIN MODEL BASED ON A PLURALITY OF LEARNED MODELS
    • 基于多学科模型的新域名模型的确定
    • US20130018825A1
    • 2013-01-17
    • US13179741
    • 2011-07-11
    • Rayid GHANIMarko Krema
    • Rayid GHANIMarko Krema
    • G06F15/18
    • G06N99/005
    • In a machine learning system in which a plurality of learned models, each corresponding to a unique domain, already exist, new domain input for training a new domain model may be provided. Statistical characteristics of features in the new domain input are first determined. The resulting new domain statistical characteristics are then compared with statistical characteristics of features in prior input previously provided for training at least some of the plurality of learned models. Thereafter, at least one learned model of the plurality of learned models is identified as the basis for the new domain model when the new domain input statistical characteristics compare favorably with the statistical characteristics of the features in the prior input corresponding to the at least one learned model.
    • 在其中存在多个学习模型(每个对应于唯一域)的机器学习系统中,可以提供用于训练新域模型的新域输入。 首先确定新域输入中特征的统计特征。 然后将所得到的新的域统计特征与先前提供用于训练多个学习模型中的至少一些的先前输入中的特征的统计特征进行比较。 此后,当新的域输入统计特性与先前输入中对应于至少一个学习的特征的统计特征相比较时,多个学习模型的至少一个学习模型被识别为新域模型的基础 模型。
    • 9. 发明授权
    • Extraction of attributes and values from natural language documents
    • 从自然语言文件中提取属性和值
    • US07996440B2
    • 2011-08-09
    • US11742215
    • 2007-04-30
    • Katharina ProbstRayid GhaniAndrew E. FanoMarko KremaYan Liu
    • Katharina ProbstRayid GhaniAndrew E. FanoMarko KremaYan Liu
    • G06F17/30
    • G06F17/27G06F17/2745
    • One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.
    • 一个或多个分类算法被应用于至少一个自然语言文档,以便提取给定产品的属性和值。 为此,可以采用监督分类算法,半监督分类算法,无监督分类算法或这种分类算法的组合。 可以经由公共通信网络获得至少一个自然语言文档。 如此识别的两个或多个属性(或两个或多个值)可以被合并以形成一个或多个属性短语或值短语。 一旦以这种方式提取了属性和值,就可以执行关联或链接操作来建立描述产品的属性值对。 在当前优选的实施例中,(无监督)算法用于生成种子属性和值,然后可以支持受监督或半监督分类算法。