专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

41. 发明申请

US20090319505A1 TECHNIQUES FOR EXTRACTING AUTHORSHIP DATES OF DOCUMENTS 审中-公开
标题翻译：提取作者日期文件的技术
公开(公告)号：US20090319505A1
公开(公告)日：2009-12-24
申请号：US12141935
申请日：2008-06-19
申请人： Hang Li , Yunhua Hu , Guangping Gao , Yauhen Shnitko , Dmitriy Meyerzon , David Mowatt
发明人： Hang Li , Yunhua Hu , Guangping Gao , Yauhen Shnitko , Dmitriy Meyerzon , David Mowatt
IPC分类号： G06F7/06 , G06F17/30
CPC分类号： G06F17/2765 , G06N3/04
摘要： Various technologies and techniques are disclosed for calculating authorship dates for a document. A portion of a document to select to look for possible authorship dates is determined. The possible authorship dates are extracted from the portion of the document. A revised authorship date of the document is generated using a neural network. The revised authorship date is returned to an application or process that requested the date.
摘要翻译：披露了各种技术和技术来计算文件的作者日期。确定要选择查找可能的作者日期的文档的一部分。可能的作者日期是从文档的部分中提取的。使用神经网络生成文档的修订作者日期。修改后的作者日期将返回给请求日期的应用程序或进程。

42. 发明申请

US20060200464A1 Method and system for generating a document summary 审中-公开
标题翻译：用于生成文档摘要的方法和系统
公开(公告)号：US20060200464A1
公开(公告)日：2006-09-07
申请号：US11072734
申请日：2005-03-03
申请人： Michal Gideoni , David Lee , Dmitriy Meyerzon , Mihai Petriuc , Kyle Peltonen
发明人： Michal Gideoni , David Lee , Dmitriy Meyerzon , Mihai Petriuc , Kyle Peltonen
IPC分类号： G06F17/30 , G06F7/00
CPC分类号： G06F16/338 , G06F16/345
摘要： A text document is segmented into word and sentence information when the document is first presented and indexed. A memory stream is generated for the document. The memory stream includes document title information, word offsets, sentence offsets, the alternate list, and the contents of the document. The memory stream is used to determine which sentences in the document include query terms. The sentences that include query terms are ranked according to a ranking algorithm. The ranking algorithm determines which sentences include the highest number of query terms and the number of occurrences of the query terms in each sentence. A predetermined number of sentences that together contain as many query terms as possible are selected such that the sentences that are most representative of the document with respect to the query are included in the summary. The summary is generated at query time by concatenating the selected sentences with the query terms highlighted.
摘要翻译：当文档首次呈现和索引时，文本文档被分割成单词和句子信息。为文档生成内存流。存储器流包括文档标题信息，字偏移，句子偏移，备用列表和文档的内容。内存流用于确定文档中包含查询条款的哪些句子。根据排序算法对包含查询项的句子进行排序。排序算法确定哪个句子包括查询词的最高数目和每个句子中查询词的出现次数。选择一起包含尽可能多的查询词语的预定数量的句子，使得相对于查询最有代表文档的句子被包括在摘要中。通过将所选择的句子与突出显示的查询字词相连，在查询时生成摘要。

43. 发明申请

US20060136411A1 Ranking search results using feature extraction 失效
标题翻译：使用特征提取排列搜索结果
公开(公告)号：US20060136411A1
公开(公告)日：2006-06-22
申请号：US11019091
申请日：2004-12-21
申请人： Dmitriy Meyerzon , Hang Li
发明人： Dmitriy Meyerzon , Hang Li
IPC分类号： G06F17/30
CPC分类号： G06F17/30684
摘要： Methods and computer-readable media are provided for ranking search results using feature extraction data. Each of the results of a search engine query is parsed to obtain data, such as text, formatting information, metadata, and the like. The text, the formatting information and the metadata are passed through a feature extraction application to extract data that may be used to improve a ranking of the search results based on relevance of the search results to the search engine query. The feature extraction application extracts features, such as titles, found in any of the text based on formatting information applied to or associated with the text. The extracted titles, the text, the formatting information and the metadata for any given search results item are processed according to a field weighting application for determining a ranking of the given search results item. Ranked search results items may then be displayed according to ranking.
摘要翻译：提供方法和计算机可读介质用于使用特征提取数据对搜索结果进行排名。解析搜索引擎查询的每个结果以获得诸如文本，格式信息，元数据等的数据。文本，格式化信息和元数据通过特征提取应用程序传递，以提取可用于根据搜索结果与搜索引擎查询的相关性来提高搜索结果排名的数据。特征提取应用程序基于应用于或与文本相关联的格式化信息来提取在任何文本中找到的特征，诸如标题。根据用于确定给定搜索结果项目的排名的字段加权应用程序处理提取的标题，文本，格式化信息和用于任何给定搜索结果项目的元数据。然后可以根据排名显示排名的搜索结果项。

44. 发明授权

US07065523B2 Scoping queries in a search engine 失效
公开(公告)号：US07065523B2
公开(公告)日：2006-06-20
申请号：US10959330
申请日：2004-10-06
申请人： Kyle Peltonen , Dmitriy Meyerzon
发明人： Kyle Peltonen , Dmitriy Meyerzon
IPC分类号： G06F17/30
CPC分类号： G06F17/30867 , Y10S707/99931 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935 , Y10S707/99936 , Y10S707/99942 , Y10S707/99943 , Y10S707/99944 , Y10S707/99945
摘要： Systems and methods for scoping a search. When a content index for electronic data is built, one or more scope restrictions are included in the content index. The scope restriction may be, for example, a root folder identifier, a mailbox identifier, or a URL. Because the scope restriction is included in the content index random access of the property store to determine the scope is avoided. Rather, the scope restriction is implicitly added to a search that uses the content index. By including a scope restriction in the search query, the search results identified from the content index are limited to results that match the scope restriction. Advantageously, the effect of including the scope restriction in the search is ignored if the search results are relatively small or when including the scope restriction provides little benefit.

45. 发明申请

US20060074911A1 System and method for batched indexing of network documents 失效
标题翻译：批量索引网络文件的系统和方法
公开(公告)号：US20060074911A1
公开(公告)日：2006-04-06
申请号：US10956891
申请日：2004-09-30
申请人： Mircea Neagovici-Negoescu , David Lee , Kyle Peltonen , Dmitriy Meyerzon
发明人： Mircea Neagovici-Negoescu , David Lee , Kyle Peltonen , Dmitriy Meyerzon
IPC分类号： G06F17/30
CPC分类号： G06F17/30861
摘要： A process takes advantage of a structure of a server hosting a network site that includes a change log stored in a database to batch index documents for search queries. The content of the site is batched and shipped in bulk from the server to an indexer. The change log keeps track of the changes to the content of the site. The indexer incrementally requests updates to the index using the change log and batches the changes so that the bandwidth usage and processor overhead costs are reduced.
摘要翻译：一个进程利用托管网站的服务器的结构，其中包括存储在数据库中的更改日志，用于搜索查询的批索引文档。网站的内容已批量批量运输，并从服务器发货到索引器。更改日志会跟踪站点内容的更改。索引器使用更改日志递增地请求对索引的更新，并批量更改，以减少带宽使用量和处理器间接成本。

46. 发明申请

US20060074865A1 System and method for scoping searches using index keys 有权
公开(公告)号：US20060074865A1
公开(公告)日：2006-04-06
申请号：US10951123
申请日：2004-09-27
申请人： Chadd Merrigan , Kyle Peltonen , Dmitriy Meyerzon , David Lee
发明人： Chadd Merrigan , Kyle Peltonen , Dmitriy Meyerzon , David Lee
IPC分类号： G06F17/30
CPC分类号： G06F17/30864 , G06F17/30967 , Y10S707/99931 , Y10S707/99932 , Y10S707/99933 , Y10S707/99936 , Y10S707/99943
摘要： A set of index keys is included in an index search system that are associated with the scope of the search rather than the content of the documents that are the target of the search. These scope related index keys, or scope keys allows the scope of the search to be selected, reducing the number of documents that a search is required to sift through to obtain results. Furthermore, compound scopes are recognized and stored such that an index of complex search scopes is provided to eliminate rehashing of the searches based on these complex search scopes.

47. 发明申请

US20050086583A1 Proxy server using a statistical model 失效
标题翻译：代理服务器使用统计模型
公开(公告)号：US20050086583A1
公开(公告)日：2005-04-21
申请号：US10981962
申请日：2004-11-05
申请人： Kenji Obata , Dmitriy Meyerzon
发明人： Kenji Obata , Dmitriy Meyerzon
IPC分类号： G06F17/30 , G06F15/00
CPC分类号： G06F17/30864 , Y10S707/99931 , Y10S707/99933
摘要： A computer based system and method of determining whether to re-fetch a previously retrieved document across a computer network is disclosed. The method utilizes a statistical model to determine whether the previously retrieved document likely changed since last accessed. The statistical model is continuously improving its accuracy by training internal probability distributions to reflect the actual experience with change rate patterns of the documents accessed. The decision of whether to access the document is based on the probability of change compared against a desired synchronization level, random selections, maximum limits on the amount of time since the document was last accessed, and other criterion. Once the decision to access is made, the document is checked for changes and this information is used to train the statistical model.
摘要翻译：公开了一种基于计算机的系统和方法，用于确定是否通过计算机网络重新获取先前检索的文档。该方法利用统计模型来确定先前检索的文档自上次访问以来是否可能改变。统计模型通过训练内部概率分布来不断提高其准确性，以反映所访问文件的变化率模式的实际经验。是否访问文档的决定是基于与期望的同步级别进行比较的更改概率，随机选择，自上次访问文档以来的时间量的最大限制以及其他标准。一旦作出决定，将对文件进行更改检查，并将此信息用于训练统计模型。

48. 发明申请

US20050044074A1 Scoping queries in a search engine 失效
标题翻译：搜索引擎中的范围查询
公开(公告)号：US20050044074A1
公开(公告)日：2005-02-24
申请号：US10959330
申请日：2004-10-06
申请人： Kyle Peltonen , Dmitriy Meyerzon
发明人： Kyle Peltonen , Dmitriy Meyerzon
IPC分类号： G06F17/30
CPC分类号： G06F17/30867 , Y10S707/99931 , Y10S707/99933 , Y10S707/99934 , Y10S707/99935 , Y10S707/99936 , Y10S707/99942 , Y10S707/99943 , Y10S707/99944 , Y10S707/99945
摘要： Systems and methods for scoping a search. When a content index for electronic data is built, one or more scope restrictions are included in the content index. The scope restriction may be, for example, a root folder identifier, a mailbox identifier, or a URL. Because the scope restriction is included in the content index random access of the property store to determine the scope is avoided. Rather, the scope restriction is implicitly added to a search that uses the content index. By including a scope restriction in the search query, the search results identified from the content index are limited to results that match the scope restriction. Advantageously, the effect of including the scope restriction in the search is ignored if the search results are relatively small or when including the scope restriction provides little benefit.
摘要翻译：用于范围搜索的系统和方法。当构建电子数据的内容索引时，内容索引中包含一个或多个范围限制。范围限制可以是例如根文件夹标识符，邮箱标识符或URL。由于范围限制包含在内容索引中，属性存储的随机存取确定范围被避免。而是将范围限制隐式添加到使用内容索引的搜索中。通过在搜索查询中包含范围限制，从内容索引识别的搜索结果仅限于匹配范围限制的结果。有利地，如果搜索结果相对较小或包括范围限制几乎没有什么益处，则忽略包括范围限制在搜索中的效果。

49. 发明授权

US06638314B1 Method of web crawling utilizing crawl numbers 失效
标题翻译：基于顺序网络爬网数字的比较来检索新的和更新的文档
公开(公告)号：US06638314B1
公开(公告)日：2003-10-28
申请号：US09105758
申请日：1998-06-26
申请人： Dmitriy Meyerzon , Sankrant Sanu
发明人： Dmitriy Meyerzon , Sankrant Sanu
IPC分类号： G06F702
CPC分类号： G06F17/30864
摘要： A computer based system and method of retrieving information pertaining to electronic documents on a computer network is disclosed. The method includes maintaining a database that associates each electronic document with a corresponding crawl number that indicates the most recent crawl during which a change to the document was detected. During a subsequent crawl, electronic documents that have changed since the previous crawl are retrieved, and selected data is stored in a database. The retrieved document information is marked with a crawl number. During subsequent searches, crawl numbers are used to determine documents that have changed since a specified crawl.
摘要翻译：公开了一种基于计算机的系统和在计算机网络上检索与电子文档有关的信息的方法。该方法包括维护将每个电子文档与指示在其中检测到文档的更改的最近的爬行的相应抓取号码相关联的数据库。在随后的爬网中，检索自上次抓取之后发生更改的电子文档，并将选定的数据存储在数据库中。检索到的文档信息被标记为爬行号码。在后续搜索中，抓取号码用于确定自指定抓取以来发生更改的文档。

50. 发明授权

US06199081B1 Automatic tagging of documents and exclusion by content 失效
标题翻译：自动标记文件和按内容排除
公开(公告)号：US06199081B1
公开(公告)日：2001-03-06
申请号：US09107225
申请日：1998-06-30
申请人： Dmitriy Meyerzon , William G. Nichols
发明人： Dmitriy Meyerzon , William G. Nichols
IPC分类号： G06F1721
CPC分类号： G06F17/24 , G06F17/218 , G06F17/2264 , Y10S707/99936
摘要： A computer-based method and system for processing data obtained from documents retrieved from a computer network during a gathering project is disclosed. Plugging in modular active and consumer plug-ins into the gathering project configures the information processing capability of the gathering process that retrieves the documents. The gathering process retrieves a copy of an electronic document from a server connected to the computer network and returns a document data stream that includes the retrieved document's data and its “properties.” One or more active plug-ins plugged-in to the gathering process is used to add, delete or modify the properties in the document data stream based on the document's contents or properties. The modified document data stream is then passed to one or more consumer plug-ins that use the properties in the modified document data stream to process the document in some manner. An active plug-in can prevent any part of the document data stream from being forwarded to subsequent active or consumer plug-ins in the project. An active plug-in can also control the consumer plug-ins by instructing them to abort processing of a particular document after analyzing some of the document's contents while the document is being processed.
摘要翻译：公开了一种基于计算机的方法和系统，用于处理在收集项目期间从计算机网络检索的文档获得的数据。将模块化的主动和消费者插件插入到收集项目中，可以配置检索文档的收集过程的信息处理能力。收集过程从连接到计算机网络的服务器检索电子文档的副本，并返回包含检索到的文档数据及其“属性”的文档数据流。插入到收集过程中的一个或多个活动插件用于根据文档的内容或属性添加，删除或修改文档数据流中的属性。然后将经修改的文档数据流传递给使用修改的文档数据流中的属性的一个或多个消费者插件以某种方式处理该文档。活动插件可以防止文档数据流的任何部分转发到项目中的后续活动或消费者插件。活动插件还可以通过在文档处理之后分析文档的某些内容后，指示他们中止特定文档的处理来控制消费者插件。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式