会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 10. 发明授权
    • Training set construction for taxonomic classification
    • 分类分类培训班
    • US08122005B1
    • 2012-02-21
    • US12604025
    • 2009-10-22
    • Philo JuangChristopher TestaNicolaus Mote
    • Philo JuangChristopher TestaNicolaus Mote
    • G06F17/30
    • G06F17/30707
    • A training set generator may be configured to input a taxonomy including a hierarchy of categories and a plurality of top-level sites, and to output a training set of categorized data. The training set generator may include a crawler configured to crawl each of the top-level sites to determine at least one lower-level site associated therewith and to store the top-level sites and associated lower-level sites as crawl data. The training set generator also may include an extractor configured to determine, for each of the top-level sites, a corresponding site-specific extraction template associating at least one portion of the corresponding top-level site with at least one category of the hierarchy of categories, and further configured to apply each site-specific extraction template to corresponding crawl data to thereby associate the crawl data with the categories of the hierarchical categories and obtain categorized data of the training set.
    • 训练集生成器可以被配置为输入包括类别的层级和多个顶级站点的分类,并且输出分类数据的训练集合。 训练集生成器可以包括被配置为爬取每个顶级站点以确定与其相关联的至少一个下级站点并将顶级站点和相关联的较低级站点存储为爬网数据的爬行器。 训练集生成器还可以包括提取器,其被配置为针对每个顶级站点确定相应的站点特定提取模板,其将相应顶级站点的至少一部分与至少一个类别的层次结构相关联 类别,并且还被配置为将每个站点特定提取模板应用于对应的抓取数据,从而将爬网数据与分层类别的类别相关联,并获得训练集合的分类数据。