会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Creating taxonomies and training data for document categorization
    • 为文档分类创建分类和培训数据
    • US08341159B2
    • 2012-12-25
    • US11734528
    • 2007-04-12
    • Stephen C. Gates
    • Stephen C. Gates
    • G06F7/00G06F17/30
    • G06F17/3071Y10S707/99935Y10S707/99943
    • Methods, apparatus and systems are provided to generate from a set of training documents a set of training data and a set of features for a taxonomy of categories. In this generated taxonomy the degree of feature overlap among categories is minimized in order to optimize use with a machine-based categorizer. However, the categories still make sense to a human because a human makes the decisions regarding category definitions. In an example embodiment, for each category, a plurality of training documents selected using Web search engines is generated, the documents winnowed to produce a more refined set of training documents, and a set of features highly differentiating for that category within a set of categories (a supercategory) extracted. This set of training documents or differentiating features is used as input to a categorizer, which determines for a plurality of test documents the plurality of categories to which they best belong.
    • 提供方法,装置和系统以从一组训练文件中产生一组训练数据和用于分类分类的一组特征。 在这个生成的分类法中,最小化类别之间的特征重叠程度,以优化与基于机器的分类器的使用。 然而,类别对人类来说仍然是有意义的,因为人类对类别定义做出决定。 在一个示例性实施例中,对于每个类别,生成使用Web搜索引擎选择的多个训练文档,该文档被确定为产生更精细的一组训练文档,以及一组在一组类别内对该类别进行高度区分的特征 (超级类别)提取。 该组训练文档或区分特征被用作分类器的输入,分类器确定多个测试文档最佳属性的多个类别。
    • 7. 发明授权
    • Creating taxonomies and training data for document categorization
    • 为文档分类创建分类和培训数据
    • US07409404B2
    • 2008-08-05
    • US10205666
    • 2002-07-25
    • Stephen C. Gates
    • Stephen C. Gates
    • G06F7/00
    • G06F17/3071Y10S707/99935Y10S707/99943
    • Methods, apparatus and systems to generate from a set of training documents a set of training data and a set of features for a taxonomy of categories. In this generated taxonomy the degree of feature overlap among categories is minimized in order to optimize use with a machine-based categorizer. However, the categories still make sense to a human because a human makes the decisions regarding category definitions. In an example embodiment, for each category, a plurality of training documents selected using Web search engines is generated, the documents winnowed to produce a more refined set of training documents, and a set of features highly differentiating for that category within a set of categories (a supercategory) extracted. This set of training documents or differentiating features is used as input to a categorizer, which determines for a plurality of test documents the plurality of categories to which they best belong.
    • 从一组训练文件中产生一组训练数据和一组分类的特征的方法,装置和系统。 在这个生成的分类法中,最小化类别之间的特征重叠程度,以优化与基于机器的分类器的使用。 然而,类别对人类来说仍然是有意义的,因为人类对类别定义做出决定。 在一个示例性实施例中,对于每个类别,生成使用Web搜索引擎选择的多个训练文档,该文档被确定为产生更精细的一组训练文档,以及一组在一组类别内对该类别进行高度区分的特征 (超级类别)提取。 该组训练文档或区分特征被用作分类器的输入,分类器确定多个测试文档最佳属性的多个类别。