专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20120158731A1 DERIVING DOCUMENT SIMILARITY INDICES 有权
标题翻译：衍生文件类似性指标
公开(公告)号：US20120158731A1
公开(公告)日：2012-06-21
申请号：US12970650
申请日：2010-12-16
申请人： Sorin Gherman , Kunal Mukerjee , Adam Prout
发明人： Sorin Gherman , Kunal Mukerjee , Adam Prout
IPC分类号： G06F17/30
CPC分类号： G06F17/30011 , G06F17/30017 , G06F17/3002 , G06F17/30705
摘要： The present invention extends to methods, systems, and computer program products for deriving document similarity indices. Embodiments of the invention include scalable and efficient mechanisms for deriving and updating a document similarity index for a plurality of documents. The number of maintained similarities can be controlled to conserve CPU and storage resources.
摘要翻译：本发明扩展到用于导出文档相似性指标的方法，系统和计算机程序产品。本发明的实施例包括用于导出和更新多个文档的文档相似性索引的可扩展和有效的机制。可以控制维护的相似性数量以节省CPU和存储资源。

2. 发明申请

US20110264997A1 Scalable Incremental Semantic Entity and Relatedness Extraction from Unstructured Text 审中-公开
标题翻译：非结构化文本的可扩展增量语义实体和相关性提取
公开(公告)号：US20110264997A1
公开(公告)日：2011-10-27
申请号：US12764107
申请日：2010-04-21
申请人： Kunal Mukerjee , Sorin Gherman
发明人： Kunal Mukerjee , Sorin Gherman
IPC分类号： G06F17/30 , G06F17/21
CPC分类号： G06F16/3334
摘要： A search engine for documents containing text may process text using a statistical language model, classify the text based on entropy, and create suffix trees or other mappings of the text for each classification. From the suffix trees or mappings, a graph may be constructed with relationship strengths between different words or text strings. The graph may be used to determine search results, and may be browsed or navigated before viewing search results. As new documents are added, they may be processed and added to the suffix trees, then the graph may be created on demand in response to a search request. The graph may be represented as a adjacency matrix, and a transitive closure algorithm may process the adjacency matrix as a background process.
摘要翻译：包含文本的文档的搜索引擎可以使用统计语言模型处理文本，基于熵分类文本，并为每个分类创建后缀树或文本的其他映射。从后缀树或映射中，可以使用不同单词或文本字符串之间的关系强度来构建图形。该图可以用于确定搜索结果，并且可以在查看搜索结果之前被浏览或导航。当添加新文档时，可以对它们进行处理并添加到后缀树中，然后可以根据搜索请求按需创建图形。该图可以表示为邻接矩阵，并且传递闭包算法可以将邻接矩阵作为后台进程来处理。

3. 发明授权

US08478740B2 Deriving document similarity indices 有权
标题翻译：得出文献相似性指标
公开(公告)号：US08478740B2
公开(公告)日：2013-07-02
申请号：US12970650
申请日：2010-12-16
申请人： Sorin Gherman , Kunal Mukerjee , Adam Prout
发明人： Sorin Gherman , Kunal Mukerjee , Adam Prout
IPC分类号： G06F17/30
CPC分类号： G06F17/30011 , G06F17/30017 , G06F17/3002 , G06F17/30705
摘要： The present invention extends to methods, systems, and computer program products for deriving document similarity indices. Embodiments of the invention include scalable and efficient mechanisms for deriving and updating a document similarity index for a plurality of documents. The number of maintained similarities can be controlled to conserve CPU and storage resources.
摘要翻译：本发明扩展到用于导出文档相似性指标的方法，系统和计算机程序产品。本发明的实施例包括用于导出和更新多个文档的文档相似性索引的可扩展和有效的机制。可以控制维护的相似性数量以节省CPU和存储资源。

4. 发明申请

US20120143860A1 IDENTIFYING KEY PHRASES WITHIN DOCUMENTS 有权
标题翻译：在文件中识别关键词
公开(公告)号：US20120143860A1
公开(公告)日：2012-06-07
申请号：US12959840
申请日：2010-12-03
申请人： Sorin Gherman , Kunal Mukerjee
发明人： Sorin Gherman , Kunal Mukerjee
IPC分类号： G06F17/30
CPC分类号： G06F17/3053 , G06F17/2715 , G06F17/2745 , G06F17/30864
摘要： The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).
摘要翻译：本发明扩展到用于识别文档内的关键短语的方法，系统和计算机程序产品。本发明的实施例包括使用标签索引来确定文档主要涉及的内容。例如，集成数据流和提取 - 转换 - 加载流水线，爬行，解析和单词，破坏数据库表中的大量文档。文件可以分为元组。元组可以被发送到基于启发式的算法，该算法使用统计语言模型和权重+交叉熵阈值函数来将文档归纳为其“最高N”最统计学意义的短语。因此，本发明的实施例可以通过显着的和相关的关键短语（标签）来有效地（例如，线性地）和（潜在的大量的）文档的比例来表征。

5. 发明授权

US08423546B2 Identifying key phrases within documents 有权
标题翻译：识别文档中的关键短语
公开(公告)号：US08423546B2
公开(公告)日：2013-04-16
申请号：US12959840
申请日：2010-12-03
申请人： Sorin Gherman , Kunal Mukerjee
发明人： Sorin Gherman , Kunal Mukerjee
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/3053 , G06F17/2715 , G06F17/2745 , G06F17/30864
摘要： The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).
摘要翻译：本发明扩展到用于识别文档内的关键短语的方法，系统和计算机程序产品。本发明的实施例包括使用标签索引来确定文档主要涉及的内容。例如，集成数据流和提取 - 转换 - 加载流水线，爬行，解析和单词，破坏数据库表中的大量文档。文件可以分为元组。元组可以被发送到一个启发式的算法，该算法使用统计语言模型和权重+交叉熵阈值函数来将文档归纳到其前N个最具统计意义的短语中。因此，本发明的实施例可以通过显着的和相关的关键短语（标签）来有效地（例如，线性地）和（潜在的大量的）文档的比例来表征。

6. 发明授权

US08200815B1 Method and apparatus for network services metering 有权
标题翻译：网络服务计量的方法和装置
公开(公告)号：US08200815B1
公开(公告)日：2012-06-12
申请号：US13040986
申请日：2011-03-04
申请人： Aditya K. Prasad , Sorin Gherman , Alan S. Geller , Rahul Singh , Nicholas J. Lee
发明人： Aditya K. Prasad , Sorin Gherman , Alan S. Geller , Rahul Singh , Nicholas J. Lee
IPC分类号： G06F15/173
CPC分类号： G06F11/3476 , G06F11/3495 , G06F2201/875 , H04L67/02 , H04L67/22 , H04L67/2833 , H04L67/2838 , H04M15/00 , H04M15/43 , H04M15/44
摘要： Method and apparatus for metering network services, for example Web services. In embodiments, a network services metering system may collect network service usage information via an add usage interface and store the usage information in a database. In one embodiment, the usage information may be partitioned into two or more partitions. Once the usage information has been aggregated and stored, the metering system may be queried to obtain usage statistics such as aggregate usage over specific time intervals. In one embodiment, a pipeline mechanism that generates and processes batches of usage information may be implemented for adding usage information to the database. The pipeline mechanism may help to reduce or eliminate redundancy and loss of usage information, and may make the metering system linearly scalable in multiple dimensions.
摘要翻译：用于计量网络服务的方法和装置，例如Web服务。在实施例中，网络服务计费系统可以经由添加使用界面收集网络服务使用信息，并将使用信息存储在数据库中。在一个实施例中，使用信息可以被划分为两个或更多个分区。一旦汇总并存储了使用信息，可以查询计费系统以获得诸如在特定时间间隔内的聚合使用的使用统计。在一个实施例中，可以实现生成和处理批量使用信息的管道机制，用于将使用信息添加到数据库。管道机制可以帮助减少或消除使用信息的冗余和丢失，并且可以使计量系统在多个维度上线性地可扩展。

7. 发明授权

US07908358B1 Method and apparatus for network services metering 有权
标题翻译：网络服务计量的方法和装置
公开(公告)号：US07908358B1
公开(公告)日：2011-03-15
申请号：US11396402
申请日：2006-03-31
申请人： Aditya K. Prasad , Sorin Gherman , Alan S. Geller , Rahul Singh , Nicholas J. Lee
发明人： Aditya K. Prasad , Sorin Gherman , Alan S. Geller , Rahul Singh , Nicholas J. Lee
IPC分类号： G06F15/173
CPC分类号： G06F11/3476 , G06F11/3495 , G06F2201/875 , H04L67/02 , H04L67/22 , H04L67/2833 , H04L67/2838 , H04M15/00 , H04M15/43 , H04M15/44
摘要： Method and apparatus for metering network services, for example Web services. In embodiments, a network services metering system may collect network service usage information via an add usage interface and store the usage information in a database. In one embodiment, the usage information may be partitioned into two or more partitions. Once the usage information has been aggregated and stored, the metering system may be queried to obtain usage statistics such as aggregate usage over specific time intervals. In one embodiment, a pipeline mechanism that generates and processes batches of usage information may be implemented for adding usage information to the database. The pipeline mechanism may help to reduce or eliminate redundancy and loss of usage information, and may make the metering system linearly scalable in multiple dimensions.
摘要翻译：用于计量网络服务的方法和装置，例如Web服务。在实施例中，网络服务计费系统可以经由添加使用界面收集网络服务使用信息，并将使用信息存储在数据库中。在一个实施例中，使用信息可以被划分为两个或更多个分区。一旦汇总并存储了使用信息，可以查询计费系统以获得诸如在特定时间间隔内的聚合使用的使用统计。在一个实施例中，可以实现生成和处理批量使用信息的管道机制，用于将使用信息添加到数据库。管道机制可以帮助减少或消除使用信息的冗余和丢失，并且可以使计量系统在多个维度上线性地可扩展。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式