会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Enforcing access control on resources at a location other than the source location
    • 对源位置以外的其他位置的资源执行访问控制
    • US06381602B1
    • 2002-04-30
    • US09238012
    • 1999-01-26
    • Srikanth ShoroffF. Soner TerekSankrant SanuAndrew Wallace
    • Srikanth ShoroffF. Soner TerekSankrant SanuAndrew Wallace
    • G06F1730
    • G06F21/6227G06F2221/2141Y10S707/959Y10S707/99931Y10S707/99939
    • Systems and methods for enforcing access control on secured documents that are stored outside of the direct control of the original application that would normally store and govern access to the documents. Access security can be enforced at a search engine associated with an indexing system that compiles references to documents at any number of network locations. The search engine discloses to the requesting user only those documents that the user is authorized to read. If a document is identified for potential disclosure to a user, and the document's source location has an access control system that is not directly interoperable with a native access control system of the search engine, a security provider at the search engine enforces access control. The security provider, in cooperation with the source location of the document, converts the user context that identifies the requesting user to a format that can be used by the security provider. The security provider also retrieves the access control information from the document's source location. The security provider then applies the user context to the access control information to determine if the user is authorized to read the document.
    • 用于对存储在原始应用程序的直接控制之外的安全文档执行访问控制的系统和方法,这些控制通常将存储和管理对文档的访问。 访问安全性可以在与索引系统相关联的搜索引擎中实施,索引系统编译对任何数量的网络位置的文档的引用。 搜索引擎仅向请求用户公开用户被授权阅读的那些文档。 如果文档被识别为潜在地向用户公开,并且文档的源位置具有不能与搜索引擎的本机访问控制系统直接互操作的访问控制系统,则搜索引擎上的安全提供者执行访问控制。 安全提供者与文档的源位置协作,将标识请求用户的用户上下文转换为安全提供者可以使用的格式。 安全提供者还从文档的源位置检索访问控制信息。 然后,安全提供者将用户上下文应用于访问控制信息以确定用户是否被授权读取文档。
    • 2. 发明授权
    • Method and system for incremental web crawling
    • 增量网络抓取的方法和系统
    • US06631369B1
    • 2003-10-07
    • US09345040
    • 1999-06-30
    • Dmitriy MeyerzonSrikanth ShoroffF. Soner TerekSankrant Sanu
    • Dmitriy MeyerzonSrikanth ShoroffF. Soner TerekSankrant Sanu
    • G06F1730
    • G06F17/30864Y10S707/99934
    • A Web crawler creates an index of documents in a document store on a computer network. In an initial crawl, the crawler creates a first full index for the document store. The first full crawl is based on a set of predefined “seed” URLs and crawl restrictions, and involves recursively retrieving each folder/document directly or indirectly linked to the seed URLs. In the process of creating the first full index, the crawler creates a History Table containing a list of URLs for each folder and document found in the first full crawl. The History Table also includes a local commit time (LCT) for each document and a deleted documents count (DDC) and LCT or maximum LCT (MLCT) for each folder (this assumes that the store supports a folder hierarchy and the MLCT, LCT and DDC properties). Thereafter, in an incremental crawl, the crawler determines, for each folder, (1) whether the DDC for that folder has changed and (2) whether the MLCT is more recent than the corresponding value in the History Table. If the DDC has changed, the crawler obtains a full list of items (URLs) in that folder, and compares the list with the URLs in the History Table to identify the deleted documents. The deleted documents are then deleted from the History Table and index. If the MLCT is more recent, the crawler queries the document store for the URLs of linked documents having a LCT more recent than the MLCT in the History Table for the folder. The History Table and index are then updated accordingly to reflect the changes to the document store.
    • Web爬网程序在计算机网络上的文档存储中创建文档索引。 在初始抓取中,抓取工具为文档存储创建第一个完整索引。 第一次完全抓取是基于一组预定义的“种子”URL和爬网限制,并涉及递归检索每个文件夹/文档直接或间接链接到种子网址。 在创建第一个完整索引的过程中,爬网程序创建一个历史记录表,其中包含第一次完整爬网中找到的每个文件夹和文档的URL列表。 历史表还包括每个文档的本地提交时间(LCT)和每个文件夹的删除文档计数(DDC)和LCT或最大LCT(MLCT)(这假定商店支持文件夹层次结构,MLCT,LCT和 DDC属性)。 此后,在增量爬网中,爬网程序为每个文件夹确定(1)该文件夹的DDC是否已更改,以及(2)MLCT是否比历史表中的相应值更新。 如果DDC已更改,则搜寻器会获取该文件夹中的完整项目列表(URL),并将列表与历史记录表中的URL进行比较,以标识已删除的文档。 然后从历史记录表和索引中删除已删除的文档。 如果MLCT更新,搜寻器会在文件夹的“历史记录表”中查询文档存储区中链接文档的URL,该链接文档的URL比MLCT更近。 然后更新历史表和索引,以反映文档存储的更改。
    • 3. 发明授权
    • Method and system for detecting duplicate documents in web crawls
    • Web爬网检测重复文件的方法和系统
    • US06547829B1
    • 2003-04-15
    • US09343511
    • 1999-06-30
    • Dmitriy MeyerzonSrikanth ShoroffF. Soner TerekScott Norin
    • Dmitriy MeyerzonSrikanth ShoroffF. Soner TerekScott Norin
    • G06F1700
    • G06F17/30864Y10S707/99932Y10S707/99933Y10S707/99945
    • A Web crawler application takes advantage of a document store's ability to provide a content identifier (CID) having a value that is a unique function of the physical storage location of a data object or document, such as a Web page. In operation, the crawler first tries to fetch the CID for a document. If the CID attribute is not supported by the document store, the crawler fetches the document, filters it to obtain a hash function, and commits the document to an index if the hash function is not present in a history table. If the CID is available from the document store, the CID is fetched from the document store. The crawler then determines whether the CID is present in the history table, which indicates whether an identical copy of the document in question has already been indexed under a different URL. If the CID is present, indicating that the document has already been indexed, the new URL is placed in the history file but the document itself is not retrieved from the document store, nor is it filtered again to obtain a CID. If the CID is not present in the history table, the full document is retrieved and indexed. The CID data structure is an extension of a known globally unique ID (GUID). Whereas the GUID is a 16-byte number, the CID comprises a 16-byte GUID plus an additional 6-byte number.
    • Web爬虫应用程序利用文档存储的能力来提供具有作为诸如网页的数据对象或文档的物理存储位置的唯一功能的值的内容标识符(CID)。 在操作中,爬网程序首先尝试获取文档的CID。 如果文档存储不支持CID属性,则爬网程序将获取文档,对其进行过滤以获取散列函数,如果历史记录表中不存在散列函数,则将该文档提交给索引。 如果CID可从文档存储库获得,则CID将从文档存储区中获取。 然后,爬行器确定历史表中是否存在CID,其指示相关文档的相同副本是否已经在不同的URL下被索引。 如果存在CID,指示文档已经被索引,则新的URL被放置在历史文件中,但是文档本身没有从文档存储中检索,也不再被过滤以获得CID。 如果历史表中不存在CID,则检索并索引完整文档。 CID数据结构是已知的全球唯一ID(GUID)的扩展。 而GUID是16字节的数字,CID包括一个16字节的GUID加上一个额外的6个字节的数字。
    • 5. 发明授权
    • Systems and methods for fragment-based serialization
    • 基于片段的序列化的系统和方法
    • US07702637B2
    • 2010-04-20
    • US11154496
    • 2005-06-15
    • F. Soner TerekAjay KalhanNagavamsi PonnekantiSrikumar RangarajanMichael J. Zwilling
    • F. Soner TerekAjay KalhanNagavamsi PonnekantiSrikumar RangarajanMichael J. Zwilling
    • G06F17/00G06F7/00
    • G06F17/30988G06F9/4493
    • A method and system for fragment-based serialization places one or more object members in fragments. Fragments may comprise a header and a payload. A header can provide useful information about the fragment, such as an indication of fragment type and an indication of fragment length. A payload may comprise one or more members of an object. Primitive members may be stored in a Binary Fragment with a record format payload. LOB and FS members may be stored in fragments that have a Value Type field for setting forth additional properties of the fragment. Collections may be stored in a series of fragments, a first fragment to indicate a start of a collection, one or more second fragments to serialize collection elements, and a Terminator Fragment to indicate the end of a collection. Fragment-serialized objects minimize storage overhead while providing fast instantiation and low-cost location and updating.
    • 用于基于片段的序列化的方法和系统将一个或多个对象成员放置在片段中。 片段可以包括报头和有效载荷。 头可以提供关于片段的有用信息,例如片段类型的指示和片段长度的指示。 有效载荷可以包括对象的一个​​或多个成员。 原始成员可以存储在具有记录格式有效载荷的二进制片段中。 LOB和FS成员可以存储在具有值类型字段的片段中,用于设置片段的附加属性。 集合可以存储在一系列片段中,第一片段指示集合的开始,用于序列化集合元素的一个或多个第二片段以及用于指示集合结束的终止符片段。 片段序列化的对象可以最大限度地减少存储开销,同时提供快速实例化和低成本的位置和更新。