会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Synchronizing crawler with notification source
    • 同步抓取器与通知源
    • US06424966B1
    • 2002-07-23
    • US09107227
    • 1998-06-30
    • Dmitriy MeyerzonSankrant Sanu
    • Dmitriy MeyerzonSankrant Sanu
    • G06F1730
    • G06F17/30864Y10S707/959Y10S707/99933
    • A method and system for the processing and maintenance of electronic information retrieved from electronic documents stored on a computer network. The gatherer program of the present invention employs a crawler to crawl a portion of the computer network to retrieve electronic documents found during the crawl and that meet a set of crawl restriction rules. Some or all of the data contained in the copies of electronic documents is then stored in a data store such as an index. The invention keeps the data in the data store current by accepting notifications of when a previously retrieved document has changed. The notifications are sent by a notification source that monitors a space containing the previously retrieved documents for changes occurring after the document was last retrieved by the gatherer program. Because the document is being monitored for changes by the notification source, the gatherer program only needs to retrieve the document again when the gatherer program has been notified that the document has changed. If the notification source experiences a discontinuity, such as a system shutdown, the notification source requests that the gatherer perform an initialization crawl to retrieve any documents that changed while the notification source was not operational.
    • 一种用于处理和维护从存储在计算机网络上的电子文档中检索的电子信息的方法和系统。 本发明的收集者程序使用爬行器爬行计算机网络的一部分以检索在爬行期间找到并且满足一组爬行限制规则的电子文档。 然后将包含在电子文档副本中的部分或全部数据存储在诸如索引的数据存储中。 本发明通过接受先前检索的文档何时改变的通知来保持数据存储中的数据。 通知由通知源发送,该通知源监视包含先前检索到的文档的空间,以便在收集器程序上次检索文档之后发生更改。 由于文档正在被通知源的更改监控,所以收集器程序只有在通知采集器程序文档已更改时才需要重新检索文档。 如果通知源遇到不连续性(例如系统关闭),则通知源会要求采集者执行初始化爬网以检索在通知源不可操作时更改的任何文档。
    • 2. 发明授权
    • Method of web crawling utilizing address mapping
    • 利用地址映射的Web爬网方法
    • US6145003A
    • 2000-11-07
    • US992329
    • 1997-12-17
    • Sankrant SanuDmitriy Meyerzon
    • Sankrant SanuDmitriy Meyerzon
    • H04L29/08H04L29/12G06F15/173
    • H04L29/12047H04L29/12009H04L61/15H04L67/10H04L69/329H04L67/02Y10S707/99933
    • A computer-based system and method of retrieving information pertaining to Web documents on a computer network is disclosed. The method includes maintaining an address map that associates primary addresses with secondary addresses. A primary address includes a network retrieval protocol and a network address. The secondary address may include a different retrieval protocol or a different network address from the primary document address. A Web crawler retrieves a Web document using the primary document address, and determines whether the address map contains a secondary document address prefix corresponding to the primary document address prefix. If a secondary document address prefix exists, the Web crawler creates a secondary address, retrieves additional information pertaining to the Web document, and combines the additional information with the data retrieved from the Web document. The combined data may be stored in an index, and subsequently used to perform a document search.
    • 公开了一种在计算机网络上检索与Web文档有关的信息的基于计算机的系统和方法。 该方法包括维护将主地址与辅助地址相关联的地址映射。 主地址包括网络检索协议和网络地址。 次要地址可以包括与主要文档地址不同的检索协议或不同的网络地址。 Web爬网程序使用主文档地址检索Web文档,并确定地址映射是否包含与主文档地址前缀相对应的辅助文档地址前缀。 如果存在辅助文档地址前缀,则Web爬网程序将创建辅助地址,检索与Web文档有关的其他信息,并将其他信息与从Web文档检索的数据组合。 组合数据可以存储在索引中,并且随后用于执行文档搜索。
    • 3. 发明授权
    • Method and system for incremental web crawling
    • 增量网络抓取的方法和系统
    • US06631369B1
    • 2003-10-07
    • US09345040
    • 1999-06-30
    • Dmitriy MeyerzonSrikanth ShoroffF. Soner TerekSankrant Sanu
    • Dmitriy MeyerzonSrikanth ShoroffF. Soner TerekSankrant Sanu
    • G06F1730
    • G06F17/30864Y10S707/99934
    • A Web crawler creates an index of documents in a document store on a computer network. In an initial crawl, the crawler creates a first full index for the document store. The first full crawl is based on a set of predefined “seed” URLs and crawl restrictions, and involves recursively retrieving each folder/document directly or indirectly linked to the seed URLs. In the process of creating the first full index, the crawler creates a History Table containing a list of URLs for each folder and document found in the first full crawl. The History Table also includes a local commit time (LCT) for each document and a deleted documents count (DDC) and LCT or maximum LCT (MLCT) for each folder (this assumes that the store supports a folder hierarchy and the MLCT, LCT and DDC properties). Thereafter, in an incremental crawl, the crawler determines, for each folder, (1) whether the DDC for that folder has changed and (2) whether the MLCT is more recent than the corresponding value in the History Table. If the DDC has changed, the crawler obtains a full list of items (URLs) in that folder, and compares the list with the URLs in the History Table to identify the deleted documents. The deleted documents are then deleted from the History Table and index. If the MLCT is more recent, the crawler queries the document store for the URLs of linked documents having a LCT more recent than the MLCT in the History Table for the folder. The History Table and index are then updated accordingly to reflect the changes to the document store.
    • Web爬网程序在计算机网络上的文档存储中创建文档索引。 在初始抓取中,抓取工具为文档存储创建第一个完整索引。 第一次完全抓取是基于一组预定义的“种子”URL和爬网限制,并涉及递归检索每个文件夹/文档直接或间接链接到种子网址。 在创建第一个完整索引的过程中,爬网程序创建一个历史记录表,其中包含第一次完整爬网中找到的每个文件夹和文档的URL列表。 历史表还包括每个文档的本地提交时间(LCT)和每个文件夹的删除文档计数(DDC)和LCT或最大LCT(MLCT)(这假定商店支持文件夹层次结构,MLCT,LCT和 DDC属性)。 此后,在增量爬网中,爬网程序为每个文件夹确定(1)该文件夹的DDC是否已更改,以及(2)MLCT是否比历史表中的相应值更新。 如果DDC已更改,则搜寻器会获取该文件夹中的完整项目列表(URL),并将列表与历史记录表中的URL进行比较,以标识已删除的文档。 然后从历史记录表和索引中删除已删除的文档。 如果MLCT更新,搜寻器会在文件夹的“历史记录表”中查询文档存储区中链接文档的URL,该链接文档的URL比MLCT更近。 然后更新历史表和索引,以反映文档存储的更改。
    • 4. 发明授权
    • System and method for locating information in an on-line network
    • 用于在线网络中定位信息的系统和方法
    • US5974409A
    • 1999-10-26
    • US518530
    • 1995-08-23
    • Sankrant SanuAlan S. Pearson
    • Sankrant SanuAlan S. Pearson
    • G06F17/30
    • G06F17/30864Y10S707/99931Y10S707/99933
    • The find system of the present invention operates as an extension of a computer's operating system and allows an end-user of an on-line network to enter a search request to locate offerings in different services. In the on-line network, multiple query modules process the search requests by using multiple indexes which associate search terms with offerings in the different services. In addition, multiple find modules balance the processing loads placed on the query modules by selectively routing the search requests to the query modules. In addition to locating offerings in the on-line network, the find system can also establish connections with external data sources and route search requests to the external data sources. Furthermore, the find system provides a fault-tolerant system in which the find modules reroute the search requests to other query modules when errors occur. The find system also contains an indexing module which executes on separate processors and allows service providers to create specialized indexing schemes. These specialized indexes are then periodically updated and transferred to the query modules using techniques which allow search request processing during the update process. Still further, the find system provides an improved method of obtaining up-to-date security clearances for located offerings.
    • 本发明的查找系统作为计算机操作系统的扩展,并且允许在线网络的最终用户输入搜索请求以定位不同服务中的产品。 在在线网络中,多个查询模块通过使用多个索引处理搜索请求,这些索引将搜索词与不同服务中的提供相关联。 此外,多个查找模块通过选择性地将搜索请求路由到查询模块来平衡查询模块上的处理负载。 除了在在线网络中查找产品外,查找系统还可以建立与外部数据源的连接,并将搜索请求路由到外部数据源。 此外,查找系统提供了容错系统,其中当发生错误时,查找模块将搜索请求重新路由到其他查询模块。 查找系统还包含索引模块,该模块在单独的处理器上执行,并允许服务提供商创建专门的索引方案。 然后定期更新这些专用索引,并使用允许在更新过程中进行搜索请求处理的技术将其传输到查询模块。 此外,查找系统提供了一种改进的方法来获取定位产品的最新安全许可。
    • 5. 发明授权
    • Method of web crawling utilizing crawl numbers
    • 基于顺序网络爬网数字的比较来检索新的和更新的文档
    • US06638314B1
    • 2003-10-28
    • US09105758
    • 1998-06-26
    • Dmitriy MeyerzonSankrant Sanu
    • Dmitriy MeyerzonSankrant Sanu
    • G06F702
    • G06F17/30864
    • A computer based system and method of retrieving information pertaining to electronic documents on a computer network is disclosed. The method includes maintaining a database that associates each electronic document with a corresponding crawl number that indicates the most recent crawl during which a change to the document was detected. During a subsequent crawl, electronic documents that have changed since the previous crawl are retrieved, and selected data is stored in a database. The retrieved document information is marked with a crawl number. During subsequent searches, crawl numbers are used to determine documents that have changed since a specified crawl.
    • 公开了一种基于计算机的系统和在计算机网络上检索与电子文档有关的信息的方法。 该方法包括维护将每个电子文档与指示在其中检测到文档的更改的最近的爬行的相应抓取号码相关联的数据库。 在随后的爬网中,检索自上次抓取之后发生更改的电子文档,并将选定的数据存储在数据库中。 检索到的文档信息被标记为爬行号码。 在后续搜索中,抓取号码用于确定自指定抓取以来发生更改的文档。
    • 6. 发明授权
    • Enforcing access control on resources at a location other than the source location
    • 对源位置以外的其他位置的资源执行访问控制
    • US06381602B1
    • 2002-04-30
    • US09238012
    • 1999-01-26
    • Srikanth ShoroffF. Soner TerekSankrant SanuAndrew Wallace
    • Srikanth ShoroffF. Soner TerekSankrant SanuAndrew Wallace
    • G06F1730
    • G06F21/6227G06F2221/2141Y10S707/959Y10S707/99931Y10S707/99939
    • Systems and methods for enforcing access control on secured documents that are stored outside of the direct control of the original application that would normally store and govern access to the documents. Access security can be enforced at a search engine associated with an indexing system that compiles references to documents at any number of network locations. The search engine discloses to the requesting user only those documents that the user is authorized to read. If a document is identified for potential disclosure to a user, and the document's source location has an access control system that is not directly interoperable with a native access control system of the search engine, a security provider at the search engine enforces access control. The security provider, in cooperation with the source location of the document, converts the user context that identifies the requesting user to a format that can be used by the security provider. The security provider also retrieves the access control information from the document's source location. The security provider then applies the user context to the access control information to determine if the user is authorized to read the document.
    • 用于对存储在原始应用程序的直接控制之外的安全文档执行访问控制的系统和方法,这些控制通常将存储和管理对文档的访问。 访问安全性可以在与索引系统相关联的搜索引擎中实施,索引系统编译对任何数量的网络位置的文档的引用。 搜索引擎仅向请求用户公开用户被授权阅读的那些文档。 如果文档被识别为潜在地向用户公开,并且文档的源位置具有不能与搜索引擎的本机访问控制系统直接互操作的访问控制系统,则搜索引擎上的安全提供者执行访问控制。 安全提供者与文档的源位置协作,将标识请求用户的用户上下文转换为安全提供者可以使用的格式。 安全提供者还从文档的源位置检索访问控制信息。 然后,安全提供者将用户上下文应用于访问控制信息以确定用户是否被授权读取文档。