会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 42. 发明申请
    • Method and system for generating a document summary
    • 用于生成文档摘要的方法和系统
    • US20060200464A1
    • 2006-09-07
    • US11072734
    • 2005-03-03
    • Michal GideoniDavid LeeDmitriy MeyerzonMihai PetriucKyle Peltonen
    • Michal GideoniDavid LeeDmitriy MeyerzonMihai PetriucKyle Peltonen
    • G06F17/30G06F7/00
    • G06F16/338G06F16/345
    • A text document is segmented into word and sentence information when the document is first presented and indexed. A memory stream is generated for the document. The memory stream includes document title information, word offsets, sentence offsets, the alternate list, and the contents of the document. The memory stream is used to determine which sentences in the document include query terms. The sentences that include query terms are ranked according to a ranking algorithm. The ranking algorithm determines which sentences include the highest number of query terms and the number of occurrences of the query terms in each sentence. A predetermined number of sentences that together contain as many query terms as possible are selected such that the sentences that are most representative of the document with respect to the query are included in the summary. The summary is generated at query time by concatenating the selected sentences with the query terms highlighted.
    • 当文档首次呈现和索引时,文本文档被分割成单词和句子信息。 为文档生成内存流。 存储器流包括文档标题信息,字偏移,句子偏移,备用列表和文档的内容。 内存流用于确定文档中包含查询条款的哪些句子。 根据排序算法对包含查询项的句子进行排序。 排序算法确定哪个句子包括查询词的最高数目和每个句子中查询词的出现次数。 选择一起包含尽可能多的查询词语的预定数量的句子,使得相对于查询最有代表文档的句子被包括在摘要中。 通过将所选择的句子与突出显示的查询字词相连,在查询时生成摘要。
    • 43. 发明申请
    • Ranking search results using feature extraction
    • 使用特征提取排列搜索结果
    • US20060136411A1
    • 2006-06-22
    • US11019091
    • 2004-12-21
    • Dmitriy MeyerzonHang Li
    • Dmitriy MeyerzonHang Li
    • G06F17/30
    • G06F17/30684
    • Methods and computer-readable media are provided for ranking search results using feature extraction data. Each of the results of a search engine query is parsed to obtain data, such as text, formatting information, metadata, and the like. The text, the formatting information and the metadata are passed through a feature extraction application to extract data that may be used to improve a ranking of the search results based on relevance of the search results to the search engine query. The feature extraction application extracts features, such as titles, found in any of the text based on formatting information applied to or associated with the text. The extracted titles, the text, the formatting information and the metadata for any given search results item are processed according to a field weighting application for determining a ranking of the given search results item. Ranked search results items may then be displayed according to ranking.
    • 提供方法和计算机可读介质用于使用特征提取数据对搜索结果进行排名。 解析搜索引擎查询的每个结果以获得诸如文本,格式信息,元数据等的数据。 文本,格式化信息和元数据通过特征提取应用程序传递,以提取可用于根据搜索结果与搜索引擎查询的相关性来提高搜索结果排名的数据。 特征提取应用程序基于应用于或与文本相关联的格式化信息来提取在任何文本中找到的特征,诸如标题。 根据用于确定给定搜索结果项目的排名的字段加权应用程序处理提取的标题,文本,格式化信息和用于任何给定搜索结果项目的元数据。 然后可以根据排名显示排名的搜索结果项。
    • 47. 发明申请
    • Proxy server using a statistical model
    • 代理服务器使用统计模型
    • US20050086583A1
    • 2005-04-21
    • US10981962
    • 2004-11-05
    • Kenji ObataDmitriy Meyerzon
    • Kenji ObataDmitriy Meyerzon
    • G06F17/30G06F15/00
    • G06F17/30864Y10S707/99931Y10S707/99933
    • A computer based system and method of determining whether to re-fetch a previously retrieved document across a computer network is disclosed. The method utilizes a statistical model to determine whether the previously retrieved document likely changed since last accessed. The statistical model is continuously improving its accuracy by training internal probability distributions to reflect the actual experience with change rate patterns of the documents accessed. The decision of whether to access the document is based on the probability of change compared against a desired synchronization level, random selections, maximum limits on the amount of time since the document was last accessed, and other criterion. Once the decision to access is made, the document is checked for changes and this information is used to train the statistical model.
    • 公开了一种基于计算机的系统和方法,用于确定是否通过计算机网络重新获取先前检索的文档。 该方法利用统计模型来确定先前检索的文档自上次访问以来是否可能改变。 统计模型通过训练内部概率分布来不断提高其准确性,以反映所访问文件的变化率模式的实际经验。 是否访问文档的决定是基于与期望的同步级别进行比较的更改概率,随机选择,自上次访问文档以来的时间量的最大限制以及其他标准。 一旦作出决定,将对文件进行更改检查,并将此信息用于训练统计模型。
    • 48. 发明申请
    • Scoping queries in a search engine
    • 搜索引擎中的范围查询
    • US20050044074A1
    • 2005-02-24
    • US10959330
    • 2004-10-06
    • Kyle PeltonenDmitriy Meyerzon
    • Kyle PeltonenDmitriy Meyerzon
    • G06F17/30
    • G06F17/30867Y10S707/99931Y10S707/99933Y10S707/99934Y10S707/99935Y10S707/99936Y10S707/99942Y10S707/99943Y10S707/99944Y10S707/99945
    • Systems and methods for scoping a search. When a content index for electronic data is built, one or more scope restrictions are included in the content index. The scope restriction may be, for example, a root folder identifier, a mailbox identifier, or a URL. Because the scope restriction is included in the content index random access of the property store to determine the scope is avoided. Rather, the scope restriction is implicitly added to a search that uses the content index. By including a scope restriction in the search query, the search results identified from the content index are limited to results that match the scope restriction. Advantageously, the effect of including the scope restriction in the search is ignored if the search results are relatively small or when including the scope restriction provides little benefit.
    • 用于范围搜索的系统和方法。 当构建电子数据的内容索引时,内容索引中包含一个或多个范围限制。 范围限制可以是例如根文件夹标识符,邮箱标识符或URL。 由于范围限制包含在内容索引中,属性存储的随机存取确定范围被避免。 而是将范围限制隐式添加到使用内容索引的搜索中。 通过在搜索查询中包含范围限制,从内容索引识别的搜索结果仅限于匹配范围限制的结果。 有利地,如果搜索结果相对较小或包括范围限制几乎没有什么益处,则忽略包括范围限制在搜索中的效果。
    • 49. 发明授权
    • Method of web crawling utilizing crawl numbers
    • 基于顺序网络爬网数字的比较来检索新的和更新的文档
    • US06638314B1
    • 2003-10-28
    • US09105758
    • 1998-06-26
    • Dmitriy MeyerzonSankrant Sanu
    • Dmitriy MeyerzonSankrant Sanu
    • G06F702
    • G06F17/30864
    • A computer based system and method of retrieving information pertaining to electronic documents on a computer network is disclosed. The method includes maintaining a database that associates each electronic document with a corresponding crawl number that indicates the most recent crawl during which a change to the document was detected. During a subsequent crawl, electronic documents that have changed since the previous crawl are retrieved, and selected data is stored in a database. The retrieved document information is marked with a crawl number. During subsequent searches, crawl numbers are used to determine documents that have changed since a specified crawl.
    • 公开了一种基于计算机的系统和在计算机网络上检索与电子文档有关的信息的方法。 该方法包括维护将每个电子文档与指示在其中检测到文档的更改的最近的爬行的相应抓取号码相关联的数据库。 在随后的爬网中,检索自上次抓取之后发生更改的电子文档,并将选定的数据存储在数据库中。 检索到的文档信息被标记为爬行号码。 在后续搜索中,抓取号码用于确定自指定抓取以来发生更改的文档。
    • 50. 发明授权
    • Automatic tagging of documents and exclusion by content
    • 自动标记文件和按内容排除
    • US06199081B1
    • 2001-03-06
    • US09107225
    • 1998-06-30
    • Dmitriy MeyerzonWilliam G. Nichols
    • Dmitriy MeyerzonWilliam G. Nichols
    • G06F1721
    • G06F17/24G06F17/218G06F17/2264Y10S707/99936
    • A computer-based method and system for processing data obtained from documents retrieved from a computer network during a gathering project is disclosed. Plugging in modular active and consumer plug-ins into the gathering project configures the information processing capability of the gathering process that retrieves the documents. The gathering process retrieves a copy of an electronic document from a server connected to the computer network and returns a document data stream that includes the retrieved document's data and its “properties.” One or more active plug-ins plugged-in to the gathering process is used to add, delete or modify the properties in the document data stream based on the document's contents or properties. The modified document data stream is then passed to one or more consumer plug-ins that use the properties in the modified document data stream to process the document in some manner. An active plug-in can prevent any part of the document data stream from being forwarded to subsequent active or consumer plug-ins in the project. An active plug-in can also control the consumer plug-ins by instructing them to abort processing of a particular document after analyzing some of the document's contents while the document is being processed.
    • 公开了一种基于计算机的方法和系统,用于处理在收集项目期间从计算机网络检索的文档获得的数据。 将模块化的主动和消费者插件插入到收集项目中,可以配置检索文档的收集过程的信息处理能力。 收集过程从连接到计算机网络的服务器检索电子文档的副本,并返回包含检索到的文档数据及其“属性”的文档数据流。 插入到收集过程中的一个或多个活动插件用于根据文档的内容或属性添加,删除或修改文档数据流中的属性。 然后将经修改的文档数据流传递给使用修改的文档数据流中的属性的一个或多个消费者插件以某种方式处理该文档。 活动插件可以防止文档数据流的任何部分转发到项目中的后续活动或消费者插件。 活动插件还可以通过在文档处理之后分析文档的某些内容后,指示他们中止特定文档的处理来控制消费者插件。