会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • System and method for efficient representation of data set addresses in a web crawler
    • 用于在网络爬虫中有效表示数据集地址的系统和方法
    • US06301614B1
    • 2001-10-09
    • US09433008
    • 1999-11-02
    • Marc Alexander NajorkClark Allan Heydon
    • Marc Alexander NajorkClark Allan Heydon
    • G06F15173
    • H04L29/12066G06F17/30902H04L29/12594H04L61/1511H04L61/301H04L67/02H04L67/2842H04L67/2852Y10S707/99933Y10S707/99935
    • A web crawler stores fixed length representations of document addresses in first and second caches and a disk file. When the web crawler downloads a document from a host computer, it identifies URL's (document addresses) in the downloaded document. Each identified URL is converted into a fixed size numerical representation. The numerical representation is systematically compared to numerical representations in the caches and disk file. If the representation is not found in the caches and disk file, the document corresponding to the representation is scheduled for downloading, and the representation is stored in the second cache. If the representation is not found in the caches but is found in the disk file, the representation is added to the first cache. When the second cache is full, it is merged with the disk file and the second cache is reset to an initial state. When the first cache is full, one or more representations are evicted in accordance with an eviction policy. The representations include a prefix that is a function of a host component of the corresponding URL's, and the representations are stored in the disk file in sorted order. When the web crawler searches for a representation in the disk file, an index of the disk file is searched to identify a single block of the disk file, and only that single block of the disk file is searched for the representation.
    • 网页抓取工具将文档地址的固定长度表示存储在第一和第二高速缓存和磁盘文件中。 当网页抓取工具从主机下载文档时,它会识别下载文档中的URL(文档地址)。 每个标识的URL被转换成固定大小的数字表示。 数值表示与缓存和磁盘文件中的数值表示系统地进行比较。 如果在高速缓存和磁盘文件中没有找到该表示,则与该表示相对应的文档被安排进行下载,并且该表示被存储在第二个高速缓存中。 如果在缓存中找不到表示,但在磁盘文件中找到表示,则将表示添加到第一个缓存。 当第二个缓存已满时,它与磁盘文件合并,第二个高速缓存重置为初始状态。 当第一个缓存已满时,一个或多个表示根据逐出策略被逐出。 表示包括作为相应URL的主机组件的函数的前缀,并且表示以分类顺序存储在磁盘文件中。 当网络爬网程序搜索磁盘文件中的表示时,将搜索磁盘文件的索引以标识磁盘文件的单个块,并且仅搜索该磁盘文件的该单个块。
    • 2. 发明授权
    • Web crawler system using plurality of parallel priority level queues having distinct associated download priority levels for prioritizing document downloading and maintaining document freshness
    • Web爬虫系统使用具有不同的相关联的下载优先级的多个并行优先级队列,用于优先化文档下载和维护文档新鲜度
    • US06263364B1
    • 2001-07-17
    • US09433007
    • 1999-11-02
    • Marc Alexander NajorkClark Allan HeydonJanet Lynn Wiener
    • Marc Alexander NajorkClark Allan HeydonJanet Lynn Wiener
    • G06F1516
    • G06F17/30864
    • A web crawler downloads documents from among a plurality of host computers. The web crawler enqueues document addresses in a data structure called the Frontier. The Frontier generally includes a set of queues, with all document addresses sharing a respective common host component being stored in a respective common one of the queues. Multiple threads substantially concurrently process the document addresses in the queues. The Frontier includes a set of parallel “priority queues,” each associated with a distinct priority level. Queue elements for documents to be downloaded are assigned a priority level, and then stored in the corresponding priority queue. Queue elements are then distributed from the priority queues to a set of underlying queues in accordance with their relative priorities. The threads then process the queue elements in the underlying queues. When performing a continuous crawl, the web crawler reinserts the queue element for a downloaded document into the Frontier in accordance with a download priority level associated with the downloaded document. For example, the download priority level may be determined as a function of an expiration date and time associated with document whose document address is denoted by the queue element.
    • 网络爬虫从多台主机中下载文件。 网络爬网程序将名为Frontier的数据结构中的文档地址排入队列。 边界通常包括一组队列,其中所有文档地址共享相应的公共主机组件,存储在相应的公共队列中。 多个线程基本上同时处理队列中的文档地址。 前沿包括一组并行的“优先级队列”,每个优先级队列都与不同的优先级相关联。 为要下载的文档的队列元素分配优先级,然后存储在相应的优先级队列中。 队列元素然后根据它们的相对优先级从优先级队列分配到一组底层队列。 线程然后处理底层队列中的队列元素。 当执行连续爬行时,网络爬网程序根据与下载文档相关联的下载优先级级别将下载的文档的队列元素重新插入到边界。 例如,下载优先级可以被确定为与其文档地址由队列元素表示的文档相关联的到期日期和时间的函数。
    • 5. 发明申请
    • Method and apparatus to facilitate determining insurance policy element availability
    • 有助于确定保险单元可用性的方法和装置
    • US20090248453A1
    • 2009-10-01
    • US12058342
    • 2008-03-28
    • John Lorne Campbell SeyboldClark Allan Heydon
    • John Lorne Campbell SeyboldClark Allan Heydon
    • G06Q40/00
    • G06Q40/08G06Q40/02
    • A computing platform (201) of choice can be configured and arranged to access (102 and 103) a first memory (202) that stores attributes which specifically characterize a particular candidate insurance entity and a second memory (203) that stores mappings which relate insurance policy element availability for a plurality of insurance policy elements to various corresponding insurance entity characterizing attributes. The aforementioned attributes, by one approach, correspond to respective end user-configurable dimensions. This platform can then serve (104) as a matching component by automatically using the aforementioned mappings to determine available insurance policy elements for the particular candidate insurance entity as a function of these attributes and then automatically using (105) those available insurance policy elements to configure an insurance policy consistent with these mappings.
    • 可以配置和布置选择的计算平台(201)以访问(102和103)存储特定表征特定候选保险实体的属性的第一存储器(202)和存储与保险相关的映射的第二存储器(203) 多个保险单元的政策要素可用性到各种相应的保险实体表征属性。 上述属性通过一种方法对应于相应的最终用户可配置尺寸。 然后,该平台可以通过自动使用上述映射来确定作为这些属性的函数的特定候选保险实体的可用保险单元素,然后自动使用(105)可用的保险单元件配置(104)作为匹配组件 与这些映射一致的保险单。
    • 6. 发明授权
    • System and method for enforcing politeness while scheduling downloads in a web crawler
    • 在Web爬网程序中调度下载时执行礼貌的系统和方法
    • US06321265B1
    • 2001-11-20
    • US09433005
    • 1999-11-02
    • Marc Alexander NajorkClark Allan Heydon
    • Marc Alexander NajorkClark Allan Heydon
    • G06F1300
    • G06F17/30902
    • A web crawler downloads data sets from among a plurality of host computers. The web crawler enqueues data set addresses in a set of queues, with all data set addresses sharing a respective common host address being stored in a respective common one of the queues. Each non-empty queue is assigned a next download time. Multiple threads substantially concurrently process the data set addresses in the queues. The number of queues is at least as great as the number of threads, and the threads are dynamically assigned to the queues. In particular, each thread selects a queue not being serviced by any of the other threads. The queue is selected in accordance with the next download times assigned to the queues. The data set corresponding to a data set address in the selected queue is downloaded and processed, and the data set address is dequeued from the selected queue. When the selected queue is not empty after the dequeuing step, it is assigned an updated download time. Then the thread deselects the selected queue, and the process of selecting a queue and processing a data set repeats. The next download time assigned to each queue is preferably a function of the length of time it took to download a previous document whose address was stored in the queue. For instance, the next download time may be set equal to the current time plus the a scaling constant multiplied by the download time of the previous document.
    • 网络爬虫从多个主机中下载数据集。 网络爬网程序对一组队列中的数据集地址进行排队,所有数据集地址共享相应的公共主机地址,存储在相应的公共队列中。 每个非空队列被分配下一个下载时间。 多个线程基本上同时处理队列中的数据集地址。 队列的数量至少与线程数相同,并且线程被动态分配给队列。 特别地,每个线程选择不被任何其他线程服务的队列。 根据分配给队列的下一次下载时间选择队列。 下载并处理与所选队列中的数据集地址相对应的数据集,并且从所选择的队列中出现数据集地址。 当出队步骤后所选择的队列不为空时,会为其分配更新的下载时间。 然后线程取消选择所选择的队列,并重复选择队列和处理数据集的过程。 分配给每个队列的下一次下载时间优选地是下载其地址被存储在队列中的先前文档花费的时间长度的函数。 例如,可以将下一个下载时间设置为等于当前时间加上缩放常数乘以先前文档的下载时间。
    • 8. 发明授权
    • Insurance policy revisioning method and apparatus
    • 保险政策修订方法和手段
    • US08676703B2
    • 2014-03-18
    • US11412670
    • 2006-04-27
    • Clark Allan HeydonKenneth William Branson
    • Clark Allan HeydonKenneth William Branson
    • G06Q40/00
    • G06Q40/08
    • An insurance policy is stored (101) as a plurality of discrete temporally-sequential policy data revisions. A legally binding revision for a first given date is then determined (102) by identifying all policy data revisions effective on the first given date and choosing a most temporally recent policy data revision temporally prior to a second given date. When a new policy data revision is (103) temporally subsequent as compared to a first policy data revision and also comprises a legally effective date range preceding at least in part an effective date range of the first policy data revision, legally non-overlapping policy data revisions are created (104) for each legally overlapping effective date range as exists between the new policy data revision and all temporally preceding revisions. Each newly-created legally non-overlapping policy data revision comprises changes introduced by the new policy data revision and at least one temporally preceding policy data revision.
    • 将保险单(101)存储为多个离散的时间顺序的策略数据修订。 然后通过确定在第一个给定日期生效的所有政策数据修订,并在第二个给定日期之前暂时选择最新的临时政策数据修订来确定第一个给定日期的具有法律约束力的修订(102)。 当与第一策略数据修订相比,新的策略数据修订在时间上是后续的,并且还包括至少部分地在第一策略数据修订的有效日期范围之前的合法有效的日期范围,法律上不重叠的策略数据 对于新的政策数据修订和所有临时在线修订之间的存在,每个合法重叠的生效日期范围都会创建修订(104)。 每个新创建的合法不重叠的策略数据修订包括新策略数据修订引入的更改和至少一个时间上在前的策略数据修订。
    • 9. 发明申请
    • Insurance Policy Revisioning Method
    • 保险政策修订方法
    • US20100070311A1
    • 2010-03-18
    • US12623572
    • 2009-11-23
    • Clark Allan HeydonKenneth William Branson
    • Clark Allan HeydonKenneth William Branson
    • G06Q40/00G06F17/30
    • G06Q40/08
    • An insurance policy is stored (101) as a plurality of discrete temporally-sequential policy data revisions. A legally binding revision for a first given date is then determined (102) by identifying all policy data revisions effective on the first given date and choosing a most temporally recent policy data revision temporally prior to a second given date. When a new policy data revision is (103) temporally subsequent as compared to a first policy data revision and also comprises a legally effective date range preceding at least in part an effective date range of the first policy data revision, legally non-overlapping policy data revisions are created (104) for each legally overlapping effective date range as exists between the new policy data revision and all temporally preceding revisions. Each newly-created legally non-overlapping policy data revision comprises changes introduced by the new policy data revision and at least one temporally preceding policy data revision.
    • 将保险单(101)存储为多个离散的时间顺序的策略数据修订。 然后通过确定在第一个给定日期生效的所有政策数据修订,并在第二个给定日期之前暂时选择最新的临时政策数据修订来确定第一个给定日期的具有法律约束力的修订(102)。 当与第一策略数据修订相比,新的策略数据修订在时间上是后续的,并且还包括至少部分地在第一策略数据修订的有效日期范围之前的合法有效的日期范围,法律上不重叠的策略数据 对于新的政策数据修订和所有临时在线修订之间的存在,每个合法重叠的生效日期范围都会创建修订(104)。 每个新创建的合法不重叠的策略数据修订包括新策略数据修订引入的更改和至少一个时间上在前的策略数据修订。
    • 10. 发明授权
    • System and method for efficient filtering of data set addresses in a web crawler
    • 用于有效过滤Web爬网程序中数据集地址的系统和方法
    • US06952730B1
    • 2005-10-04
    • US09607710
    • 2000-06-30
    • Marc Alexander NajorkClark Allan Heydon
    • Marc Alexander NajorkClark Allan Heydon
    • G06F13/00G06F17/30
    • G06F17/30864
    • A web crawler stores fixed length representations of document addresses in a buffer and a disk file, and optionally in a cache. When the web crawler downloads a document from a host computer, it identifies URL's (document addresses) in the downloaded document. Each identified URL is converted into a fixed size numerical representation. The numerical representation may optionally be systematically compared to the contents of a cache containing web sites which are likely to be found during the web crawl, for example previously visited web sites. The numerical representation is then systematically compared to numerical representations in the buffer, which stores numerical representations of recently-identified URL's. If the representation is not found in the buffer, it is stored in the buffer. When the buffer is full, it is ordered and then merged with numerical representations stored, in order, in the disk file. In addition, the document corresponding to each representation not found in the disk file during the merge is scheduled for downloading. The disk file may be a sparse file, indexed to correspond to the numerical representations of the URL's, so that only a relatively small fraction of the disk file must be searched and re-written in order to merge each numerical representation in the buffer.
    • 网页爬网程序将文件地址的固定长度表示存储在缓冲区和磁盘文件中,并可选地存储在缓存中。 当网页抓取工具从主机下载文档时,它会识别下载文档中的URL(文档地址)。 每个标识的URL被转换成固定大小的数字表示。 数字表示可以可选地与包含可能在网络爬行期间找到的网站的高速缓存的内容(例如先前访问过的网站)进行系统比较。 然后将数值表示与缓冲区中的数值表示进行系统比较,该数值表示存储最近识别的URL的数值表示。 如果缓冲区中没有找到该表示,则将其存储在缓冲区中。 当缓冲区已满时,它被排序,然后按照磁盘文件中的顺序存储的数字表示进行合并。 另外,在合并期间,在磁盘文件中找不到的每个表示对应的文档被安排进行下载。 磁盘文件可以是稀疏文件,被索引以对应于URL的数字表示,使得只有相对较小的一部分磁盘文件必须被搜索和重新写入才能合并缓冲器中的每个数字表示。