专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US09305091B2 Anchor tag indexing in a web crawler system 有权
公开(公告)号：US09305091B2
公开(公告)日：2016-04-05
申请号：US13300516
申请日：2011-11-18
申请人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya
发明人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya
IPC分类号： G06F17/00 , G06F17/30 , G06F17/27 , G06F17/22
CPC分类号： G06F17/30014 , G06F17/2235 , G06F17/241 , G06F17/2705 , G06F17/30321 , G06F17/30864
摘要： Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.

2. 发明授权

US08819000B1 Query modification 有权
标题翻译：查询修改
公开(公告)号：US08819000B1
公开(公告)日：2014-08-26
申请号：US13461315
申请日：2012-05-01
申请人： Anurag Acharya , Alexandre A. Verstak
发明人： Anurag Acharya , Alexandre A. Verstak
IPC分类号： G06F17/30
CPC分类号： G06F17/30 , G06F17/30672 , G06F17/30864
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for query modification. In one aspect, a method includes receiving an original query including a first limitation. First search results responsive to a modified query are obtained, where the first limitation has been omitted from the modified query. One or more common characteristics shared by two or more resources are identified. Each of the two or more resources corresponds to a different highly-ranked result of the first search results. A second modified query including the original query and a second limitation representing the one or more common characteristics is generated. Second search results responsive to the second modified query are obtained. The second search results are provided in a response to the original query.
摘要翻译：方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于查询修改。一方面，一种方法包括接收包括第一限制的原始查询。获得响应于修改查询的第一搜索结果，其中已经从修改的查询中省略了第一个限制。识别由两个或多个资源共享的一个或多个共同特征。两个或更多个资源中的每一个对应于第一搜索结果的不同高度排名的结果。生成包括原始查询和表示一个或多个共同特征的第二限制的第二修改查询。获得响应于第二修改查询的第二搜索结果。响应于原始查询提供第二个搜索结果。

3. 发明授权

US08522129B1 Identifying a primary version of a document 有权
标题翻译：识别文档的主要版本
公开(公告)号：US08522129B1
公开(公告)日：2013-08-27
申请号：US13346436
申请日：2012-01-09
申请人： Alexandre A. Verstak , Anurag Acharya
发明人： Alexandre A. Verstak , Anurag Acharya
IPC分类号： G06F17/22 , G06F7/00 , G06F17/30
CPC分类号： G06F17/2288 , G06F17/2211 , G06F17/30067 , G06F17/3023 , G06F17/30309 , G06F17/30548
摘要： A system and method identifies a primary version out of different versions of the same document. The system selects a priority of authority for each document version based on a priority rule and information associated with the document version, and selects a primary version based on the priority of authority and information associated with the document version.
摘要翻译：系统和方法从同一文档的不同版本中标识主要版本。系统根据与文档版本相关联的优先级规则和信息为每个文档版本选择权限的优先级，并且基于与文档版本相关联的权限和信息的优先级来选择主版本。

4. 发明授权

US08484548B1 Anchor tag indexing in a web crawler system 有权
标题翻译：网页抓取系统中的锚点标签索引
公开(公告)号：US08484548B1
公开(公告)日：2013-07-09
申请号：US11936421
申请日：2007-11-07
申请人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya
发明人： Huican Zhu , Jeffrey Dean , Sanjay Ghemawat , Bwolen Po-Jen Yang , Anurag Acharya
IPC分类号： G06F17/00
CPC分类号： G06F17/30014 , G06F17/2235 , G06F17/241 , G06F17/2705 , G06F17/30321 , G06F17/30864
摘要： Provided is a method and system for indexing documents in a collection of linked documents. A link log, including one or more pairings of source documents and target documents is accessed. A sorted anchor map, containing one or more target document to source document pairings, is generated. The pairings in the sorted anchor map are ordered based on target document identifiers.
摘要翻译：提供了一种用于在链接文档的集合中索引文档的方法和系统。链接日志，包括一个或多个源文档和目标文档的配对。生成包含一个或多个目标文档到源文档配对的排序的锚图。排序的锚图中的配对是基于目标文档标识符进行排序的。

5. 发明授权

US08234273B2 Document scoring based on document content update 有权
公开(公告)号：US08234273B2
公开(公告)日：2012-07-31
申请号：US13174243
申请日：2011-06-30
申请人： Anurag Acharya , Jeffrey Dean , Paul Haahr , Monika Henzinger , Steve Lawrence , Karl Pfleger , Simon Tong
发明人： Anurag Acharya , Jeffrey Dean , Paul Haahr , Monika Henzinger , Steve Lawrence , Karl Pfleger , Simon Tong
IPC分类号： G06F7/00
CPC分类号： G06Q30/0246 , G06F17/30864 , Y10S707/99933
摘要： A system may determine a measure of how a content of a document changes over time, generate a score for the document based, at least in part, on the measure of how the content of the document changes over time, and rank the document with regard to at least one other document based, at least in part, on the score.

6. 发明申请

US20120173552A1 Assigning Document Identification Tags 有权
标题翻译：分配文件识别标签
公开(公告)号：US20120173552A1
公开(公告)日：2012-07-05
申请号：US13419349
申请日：2012-03-13
申请人： Huican Zhu , Anurag Acharya
发明人： Huican Zhu , Anurag Acharya
IPC分类号： G06F17/30
CPC分类号： G06F17/30864 , G06F17/30112 , G06F17/303 , G06F17/3053 , G06F17/30867 , G06Q30/02 , G06Q30/0246 , H04L29/06 , H04L29/08072 , H04L29/0809 , H04L41/12 , H04L41/22 , H04L63/08 , H04L63/102 , H04L67/14 , H04L67/2804
摘要： Document identification tags are assigned to documents to be added to a collection of documents. Based on query-independent information about a new document, a document identification tag is assigned to the new document. The document identification tag so assigned is used in the indexing of the new document. When a list of document identification tags are produced by an index in response to a query, the list is approximately ordered with respect to a measure of query-independent relevance. In some embodiments, the measure of query-independent relevance is related to the connectivity matrix of the World Wide Web. In other embodiments, the measure is related to the recency of crawling. In still other embodiments, the measure is a mixture of these two. The provided systems and methods allow for real-time indexing of documents as they are crawled from a collection of documents.
摘要翻译：文件识别标签被分配给要添加到文档集合的文档。基于与新文档的查询无关信息，文档识别标签被分配给新文档。所分配的文档识别标签用于新文档的索引。当响应于查询而由索引产生文档识别标签的列表时，该列表关于与查询无关的相关度的度量近似排序。在一些实施例中，与查询无关的相关性的度量与万维网的连接矩阵相关。在其他实施例中，该度量与爬行的新近相关。在其他实施方案中，测量是这两者的混合物。所提供的系统和方法允许在从文档集合中爬取时对文档进行实时索引。

7. 发明申请

US20120016871A1 DOCUMENT SCORING BASED ON QUERY ANALYSIS 有权
标题翻译：基于查询分析的文档分类
公开(公告)号：US20120016871A1
公开(公告)日：2012-01-19
申请号：US13244867
申请日：2011-09-26
申请人： Anurag Acharya , Matt Cutts , Jeffrey DEAN , Paul Haahr , Monika Henzinger , Urs Hoelzle , Steve Lawrence , Karl Pfleger , Olcan Sercinoglu , Simon Tong
发明人： Anurag Acharya , Matt Cutts , Jeffrey DEAN , Paul Haahr , Monika Henzinger , Urs Hoelzle , Steve Lawrence , Karl Pfleger , Olcan Sercinoglu , Simon Tong
IPC分类号： G06F17/30
CPC分类号： G06Q30/0246 , G06F17/30864 , Y10S707/99933
摘要： A system may determine an extent to which a document is selected when the document is included in a set of search results, generate a score for the document based, at least in part, on the extent to which the document is selected when the document is included in a set of search results; and rank the document with regard to at least one other document based, at least in part, on the score.
摘要翻译：当文档被包括在一组搜索结果中时，系统可以确定文档被选择的程度，至少部分地基于在文档是文档是文档时选择文档的程度的文档的分数包含在一组搜索结果中; 并且至少部分地基于得分来排列关于至少一个其他文档的文档。

8. 发明授权

US08042112B1 Scheduler for search engine crawler 有权
标题翻译：搜索引擎抓取器的计划程序
公开(公告)号：US08042112B1
公开(公告)日：2011-10-18
申请号：US10882956
申请日：2004-06-30
申请人： Huican Zhu , Maximilian Ibel , Anurag Acharya , Howard Bradley Gobioff
发明人： Huican Zhu , Maximilian Ibel , Anurag Acharya , Howard Bradley Gobioff
IPC分类号： G06F9/46 , G06F7/00
CPC分类号： G06F17/30864
摘要： A search engine crawler includes a distributed set of schedulers that are associated with one or more segments of document identifiers (e.g., URLs) corresponding to documents on a network (e.g., WWW). Each scheduler handles the scheduling of document identifiers (for crawling) for a subset of the known document identifiers. Using a starting set of document identifiers, such as the document identifiers crawled (or scheduled for crawling) during the most recent completed crawl, the scheduler removes from the starting set those document identifiers that have been unreachable in each of the last X crawls. Other filtering mechanisms may also be used to filter out some of the document identifiers in the starting set. The resulting list of document identifiers is written to a scheduled output file for use in a next crawl cycle.
摘要翻译：搜索引擎爬行器包括与一个或多个文档标识符（例如，URL）相关联的分布式的一组调度器，对应于网络上的文档（例如，WWW）。每个调度器处理已知文档标识符的子集的文档标识符（用于爬行）的调度。使用文档标识符的起始集合，例如在最近完成的爬网期间爬行（或计划进行爬网）的文档标识符，调度程序从起始设置中删除那些在最后一次X爬网中的每一个中都无法访问的文档标识符。其他过滤机制也可用于过滤出起始集中的一些文档标识符。生成的文档标识符列表将写入一个预定的输出文件，以供下一个爬网周期使用。

9. 发明申请

US20110022605A1 DOCUMENT SCORING BASED ON LINK-BASED CRITERIA 有权
标题翻译：基于链接标准的文档分类
公开(公告)号：US20110022605A1
公开(公告)日：2011-01-27
申请号：US12896744
申请日：2010-10-01
申请人： Anurag Acharya , Matt Cutts , Jeffrey Dean , Paul Haahr , Monika Henzinger , Steve Lawrence , Karl Pfleger , Simon Tong
发明人： Anurag Acharya , Matt Cutts , Jeffrey Dean , Paul Haahr , Monika Henzinger , Steve Lawrence , Karl Pfleger , Simon Tong
IPC分类号： G06F17/30
CPC分类号： G06Q30/0246 , G06F17/30864 , Y10S707/99933
摘要： A method may include receiving a document and an initial score for the document; determining that there has been a decrease in a rate or quantity of new links that point to the document over time; classifying the document as stale in response to the determining; decreasing the initial score for the document, resulting in an updated score; and ranking the document with regard to at least one other document based, at least in part, on the score.
摘要翻译：方法可以包括接收文档的文档和初始分数; 确定随着时间的推移，指向文件的新链接的速度或数量有所减少; 将文件分类为陈旧以响应确定; 降低文档的初始分数，得到更新的分数; 并且至少部分地基于评分对至少一个其他文档进行排序。

10. 发明授权

US07840557B1 Search engine cache control 有权
标题翻译：搜索引擎缓存控制
公开(公告)号：US07840557B1
公开(公告)日：2010-11-23
申请号：US10845283
申请日：2004-05-12
申请人： Benjamin T. Smith , Anurag Acharya
发明人： Benjamin T. Smith , Anurag Acharya
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F12/0875
摘要： A search query containing at least one term is received at a search controller from a query server and preferably normalized and hashed into a representation of the search query. The representation of the search query is transmitted towards a cache containing multiple query result entries. Each query result entry contains a list of documents associated with the previously searched search query. The cache is then searched and query result entries for the search query are sent to the search controller from the cache. Subsequently, it is determined whether the query result entries are current versions for the search query. If the query result entries are not the current versions, then current versions of the query result entries are obtained.
摘要翻译：包含至少一个术语的搜索查询在搜索控制器处从查询服务器接收，并且优选地被标准化并被散列成搜索查询的表示。搜索查询的表示被发送到包含多个查询结果条目的高速缓存。每个查询结果条目包含与先前搜索的搜索查询相关联的文档列表。然后搜索缓存，并将搜索查询的查询结果条目从缓存发送到搜索控制器。随后，确定查询结果条目是否是用于搜索查询的当前版本。如果查询结果条目不是当前版本，则获取当前版本的查询结果条目。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式