会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 60. 发明申请
    • METHOD AND TECHNIQUES FOR DETERMINING CRAWLING SCHEDULE
    • 用于确定破坏时间表的方法和技术
    • US20130179424A1
    • 2013-07-11
    • US13348438
    • 2012-01-11
    • Cheng XuQiying LinXin Li
    • Cheng XuQiying LinXin Li
    • G06F17/30
    • G06F17/30864G06Q30/0241H04L67/02
    • Methods, systems and computer-readable storage medium for determining a crawling schedule. In an aspect, a method includes obtaining crawl history data for a Web site having Web pages, determining a status of the Web pages, determining a total quantity of Web pages that have a status of deleted, calculating a probability that another Web page of the Web site will be removed based on the total quantity, and storing data associating the calculated probability with the Web site. The method can further include determining, for a plurality of sets of the previous time periods, a respective crawl penalty as a combination of a penalty for crawling the Web site and a penalty for showing a deleted Web page based on the calculated probability, and determining a re-crawl schedule based on the crawl penalties.
    • 用于确定爬行时间表的方法,系统和计算机可读存储介质。 一方面,一种方法包括获得具有网页的网站的爬网历史数据,确定网页的状态,确定具有删除状态的网页的总数量,计算出该网页的另一网页的概率 将根据总量移除网站,并将计算出的概率与网站相关联的数据进行存储。 该方法可以进一步包括针对多组先前时间段,基于所计算的概率来确定作为用于爬行网站的惩罚和用于显示被删除的网页的惩罚的组合的相应爬坡罚分,以及确定 基于爬网罚款的重新爬行计划。