会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 3. 发明申请
    • SMART ALGORITHM FOR READING FROM CRAWL QUEUE
    • 从CRAWL QUEUE阅读的智能算法
    • US20110125726A1
    • 2011-05-26
    • US12625603
    • 2009-11-25
    • Mircea Neagovici-NegoescuSiddharth Rajendra Shah
    • Mircea Neagovici-NegoescuSiddharth Rajendra Shah
    • G06F15/16G06F17/30
    • G06F16/951
    • A smart algorithm for processing transaction from a crawl queue. If the crawler has in memory a predetermined number of URLs for a given host, the crawler reads from the crawl queue URLs from other hosts. As a result the crawler processes multiple hosts concurrently, and thus, uses machine resources more effectively and efficiently to process the URLs. The smart algorithm can further consider other criteria in deciding which URLs to read from the queue. These criteria can include the response time for each repository (host) the crawler processes. Additionally, the crawler can allocate its resources according to content groups (e.g., two pools), one group for faster content delivery and the second group one for slower content delivery. Thus, crawler resources can be partitioned or divided across different pools depending on repository response time. Other criteria can be provided and considered as well.
    • 用于处理来自爬网队列的事务的智能算法。 如果爬网程序在内存中有一个给定主机的预定数量的URL,则爬网程序从其他主机的爬网队列中读取URL。 因此,爬网程序同时处理多个主机,从而更有效地使用机器资源来处理URL。 智能算法可以进一步考虑其他标准来决定从队列中读取哪些URL。 这些标准可以包括爬网程序处理的每个存储库(主机)的响应时间。 此外,爬虫可以根据内容组(例如,两个池)分配其资源,一组用于更快的内容传送,另一组用于较慢的内容传送。 因此,根据存储库响应时间,可以跨越不同的池对爬网资源进行分区或划分。 也可以提供和考虑其他标准。