会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • Microhubs and its applications
    • Microhubs及其应用
    • US20070078811A1
    • 2007-04-05
    • US11241469
    • 2005-09-30
    • Srinivasan BalasubramanianMichael ChingPiyoosh JalanSatish PenmetsaAndrew Tomkins
    • Srinivasan BalasubramanianMichael ChingPiyoosh JalanSatish PenmetsaAndrew Tomkins
    • G06F17/30
    • G06F17/30864Y10S707/99932Y10S707/99937
    • A system and method of crawling at least one website comprising at least one URL includes maintaining a lookup structure comprising all of the URLs known to be on a website; calculating a hub score for each webpage of the website to be recrawled, wherein the hub score measures how likely the to be recrawled webpage includes links to fresh content published on the website; sorting all the to be recrawled pages by their hub scores; and crawling the to be recrawled pages in order from highest hub scores to lowest hub scores. The calculating comprises computing a first value equaling a percentage of a number of new relative URLs on the to be recrawled page; computing a second value equaling a percentage of a previous hub score of the to be recrawled page; and computing the hub score as a sum of the first and the second values.
    • 一种爬行包括至少一个URL的至少一个网站的系统和方法包括维护包括已知在网站上的所有URL的查找结构; 计算要重新抓取的网站的每个网页的中心评分,其中中心评分测量重新获取的网页的可能性包括链接到在网站上发布的新鲜内容; 通过他们的中心分数排序所有要重新抓取的页面; 并从最高中心分数到最低中心分数的顺序爬行重新抓取的页面。 计算包括计算等于要重新获取的页面上的多个新的相对URL的百分比的第一值; 计算等于要重新抓取的页面的先前中心点的百分比的第二值; 以及将所述中心分数计算为所述第一和第二值的总和。