专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08600997B2 Method and framework to support indexing and searching taxonomies in large scale full text indexes 有权
标题翻译：支持大规模全文索引分类和搜索索引的方法和框架
公开(公告)号：US08600997B2
公开(公告)日：2013-12-03
申请号：US11241687
申请日：2005-09-30
申请人： Nadav Eiron , Daniel N. Meredith , Joerg Meyer , Jan H. Pieper , Andrew S. Tomkins
发明人： Nadav Eiron , Daniel N. Meredith , Joerg Meyer , Jan H. Pieper , Andrew S. Tomkins
IPC分类号： G06F7/00 , G06F17/30
CPC分类号： G06F17/30734
摘要： A system and method of indexing a plurality of entities located in a taxonomy, the entities comprising sets of terms, comprises receiving terms in an index structure; building a posting list for an entity with respect to the locations of the set of terms defining the entity and data associated with the respective terms; and indexing a name of a group comprising the entities within this group at the location of the entities with the data of the group comprising the name of the respective entity at each location. The building of the posting list comprises storing the location of the term and data associated with the term in an entry in the posting list for the term. The method comprises indexing aliases of the name of the group comprising the term, and using an inverted list index to associate data with each occurrence of an index term.
摘要翻译：一种对位于分类法中的多个实体进行索引的系统和方法，所述实体包括术语集合，包括在索引结构中接收术语; 为一个实体建立关于定义与各个条款相关联的实体和数据的术语集的位置的实体的发布列表; 并且在包括在每个位置处的相应实体的名称的组的数据的实体的位置处索引包括在该组内的实体的组的名称。发布列表的构建包括将术语的位置和与该术语相关联的数据存储在该术语的发布列表中的条目中。该方法包括对包括该术语的组的名称的别名进行索引，并使用反向列表索引将数据与索引项的每次出现相关联。

2. 发明授权

US06993534B2 Data store for knowledge-based data mining system 有权
公开(公告)号：US06993534B2
公开(公告)日：2006-01-31
申请号：US10142673
申请日：2002-05-08
申请人： Matthew Denesuk , Daniel Frederick Gruhl , Kevin Snow McCurley , Joerg Meyer , Sridhar Rajagopalan , Andrew S. Tomkins , Jason Yeong Zien
发明人： Matthew Denesuk , Daniel Frederick Gruhl , Kevin Snow McCurley , Joerg Meyer , Sridhar Rajagopalan , Andrew S. Tomkins , Jason Yeong Zien
IPC分类号： G06F17/30
CPC分类号： G06F17/30864 , Y10S707/99942 , Y10S707/99943 , Y10S707/99945
摘要： In a data mining system, data is gathered into a data store using, e.g., a Web crawler. The data is classified into entities and stored into underlying vertical and horizontal tables respectively representing miner outputs and entities that can be the subjects of indexing. Data miners use rules to process the entities and append respective keys to the entities representing characteristics of the entities as derived from rules embodied in the miners, with the keys being associated with the entities in the tables. With these keys, characteristics of entities as defined by disparate expert authors of the data miners are identified for use in responding to complex data requests from customers.

3. 发明授权

US08041705B2 Microhubs and its applications 有权
标题翻译： Microhubs及其应用
公开(公告)号：US08041705B2
公开(公告)日：2011-10-18
申请号：US12348336
申请日：2009-01-05
申请人： Srinivasan Balasubramanian , Michael Ching , Piyoosh Jalan , Satish C. Penmetsa , Andrew S. Tomkins
发明人： Srinivasan Balasubramanian , Michael Ching , Piyoosh Jalan , Satish C. Penmetsa , Andrew S. Tomkins
IPC分类号： G06F17/30
CPC分类号： G06F17/30864 , Y10S707/99932 , Y10S707/99937
摘要： A system and method of crawling at least one website comprising at least one URL includes maintaining a lookup structure comprising all of the URLs known to be on a website; calculating a hub score for each webpage of the website to be recrawled, wherein the hub score measures how likely the to be recrawled webpage includes links to fresh content published on the website; sorting all the to be recrawled pages by their hub scores; and crawling the to be recrawled pages in order from highest hub scores to lowest hub scores. The calculating comprises computing a first value equaling a percentage of a number of new relative URLs on the to be recrawled page; computing a second value equaling a percentage of a previous hub score of the to be recrawled page; and computing the hub score as a sum of the first and the second values.
摘要翻译：一种爬行包括至少一个URL的至少一个网站的系统和方法包括维护包括已知在网站上的所有URL的查找结构; 计算要重新抓取的网站的每个网页的中心评分，其中中心评分测量重新获取的网页的可能性包括链接到在网站上发布的新鲜内容; 通过他们的中心分数排序所有要重新抓取的页面; 并从最高中心分数到最低中心分数的顺序爬行重新抓取的页面。计算包括计算等于要重新获取的页面上的多个新的相对URL的百分比的第一值; 计算等于要重新抓取的页面的先前中心点的百分比的第二值; 以及将所述中心分数计算为所述第一和第二值的总和。

4. 发明授权

US07496557B2 Microhubs and its applications 失效
标题翻译： Microhubs及其应用
公开(公告)号：US07496557B2
公开(公告)日：2009-02-24
申请号：US11241469
申请日：2005-09-30
申请人： Srinivasan Balasubramanian , Michael Ching , Piyoosh Jalan , Satish C. Penmetsa , Andrew S. Tomkins
发明人： Srinivasan Balasubramanian , Michael Ching , Piyoosh Jalan , Satish C. Penmetsa , Andrew S. Tomkins
IPC分类号： G06F13/30
CPC分类号： G06F17/30864 , Y10S707/99932 , Y10S707/99937
摘要： A system and method of crawling at least one website comprising at least one URL includes maintaining a lookup structure comprising all of the URLs known to be on a website; calculating a hub score for each webpage of the website to be recrawled, wherein the hub score measures how likely the to be recrawled webpage includes links to fresh content published on the website; sorting all the to be recrawled pages by their hub scores; and crawling the to be recrawled pages in order from highest hub scores to lowest hub scores. The calculating comprises computing a first value equaling a percentage of a number of new relative URLs on the to be recrawled page; computing a second value equaling a percentage of a previous hub score of the to be recrawled page; and computing the hub score as a sum of the first and the second values.
摘要翻译：一种爬行包括至少一个URL的至少一个网站的系统和方法包括维护包括已知在网站上的所有URL的查找结构; 计算要重新抓取的网站的每个网页的中心评分，其中中心评分测量重新获取的网页的可能性包括链接到在网站上发布的新鲜内容; 通过他们的中心分数排序所有要重新抓取的页面; 并从最高中心分数到最低中心分数的顺序爬行重新抓取的页面。计算包括计算等于要重新获取的页面上的多个新的相对URL的百分比的第一值; 计算等于要重新抓取的页面的先前中心点的百分比的第二值; 以及将所述中心分数计算为所述第一和第二值的总和。

5. 发明授权

US07464078B2 Method for automatically extracting by-line information 失效
标题翻译：自动提取离线信息的方法
公开(公告)号：US07464078B2
公开(公告)日：2008-12-09
申请号：US11259608
申请日：2005-10-25
申请人： Stephen Dill , Madhukar R. Korupolu , Andrew S. Tomkins
发明人： Stephen Dill , Madhukar R. Korupolu , Andrew S. Tomkins
IPC分类号： G06F17/30
CPC分类号： G06F17/30719 , Y10S707/99932 , Y10S707/99933
摘要： A by-line extraction method detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The method constructs the set of potential headlines based on the title meta-tag. The method selects a candidate headline by evaluating the set of potential headlines in order of the lengths of the potential headlines. The method extracts the by-line information from the document by using the location of the selected candidate headline to extract a string representing a date, a name, or a source located within a minimum distance from the location of the potential headline.
摘要翻译：逐行提取方法从爬行文档的标题元标签中检测潜在的标题集合，从潜在标题集合中选择候选标题，并使用所选择的位置从文档中提取副线信息候选人标题。该方法基于标题元标签构建潜在标题集。该方法通过以潜在标题的长度的顺序评估潜在标题集来选择候选标题。该方法通过使用所选择的候选标题的位置来提取来自文档的旁路信息，以提取表示距离潜在标题的位置的最小距离内的日期，名称或源的字符串。

6. 发明授权

US07281022B2 System, method, and service for segmenting a topic into chatter and subtopics 有权
标题翻译：系统，方法和服务，将主题分割成喋喋不休和副主题
公开(公告)号：US07281022B2
公开(公告)日：2007-10-09
申请号：US10847084
申请日：2004-05-15
申请人： Daniel Frederick Gruhl , Ramanathan Vaidhyanath Guha , Andrew S. Tomkins
发明人： Daniel Frederick Gruhl , Ramanathan Vaidhyanath Guha , Andrew S. Tomkins
IPC分类号： G06F17/00
CPC分类号： G06F17/30705 , Y10S707/99945 , Y10S707/99948
摘要： A topic segmenting system segments a topic into chatter and subtopics. The system decomposes a conversation into topics, producing a time-based structure for topics and subtopics in the conversation. The system extracts a large number of topics at all levels of granularity. Some of the topics extracted correspond to broad topics and some correspond to “spiky” topics or subtopics. The system comprises a process for automatically detecting spiky regions of a topic. For each possible broad topic, the present system finds regions where coverage of the broad topic overlaps significantly with the spiky region of another topic. The system then removes the spiky subtopic from the conversation. Processing is repeated until all discernable topics have been identified and removed from the conversation, yielding random topics of little duration or intensity.
摘要翻译：主题分段系统将主题分为喋喋不休和副主题。系统将对话分解为主题，为会话中的主题和子主题生成基于时间的结构。系统以各种粒度级别提取大量的主题。提取的一些主题对应于广泛的主题，一些对应于“尖锐”主题或副主题。该系统包括用于自动检测主题的尖峰区域的过程。对于每个可能的广泛话题，本系统查找广泛主题的覆盖面与另一主题的尖锐区域重叠的区域。系统然后从会话中删除尖锐的子主题。重复处理，直到所有可辨别的主题已被识别并从会话中删除，产生了持续时间或强度很小的随机话题。

7. 发明授权

US09600800B2 Creating secure social applications with extensible types 有权
公开(公告)号：US09600800B2
公开(公告)日：2017-03-21
申请号：US12615986
申请日：2009-11-10
申请人： Andrew S. Tomkins , Raghu Ramakrishnan , Shanmugasundaram Ravikumar
发明人： Andrew S. Tomkins , Cameron A. Marlow , Raghu Ramakrishnan , Shanmugasundaram Ravikumar
IPC分类号： G06F15/16 , G06Q10/10
CPC分类号： G06Q30/0625 , G06F17/30864 , G06Q10/10 , G06Q30/0282 , G06Q50/01
摘要： A social environment is provided by creating an object in response to recognition of an entity in a portion of web content, wherein the object represents the entity, the object is associated with a type selected from a set of types, and the type is associated with a schema selected from a set of schemas, where the social environment includes a set of objects including the object, wherein the objects are instances of corresponding types in a rich system of predefined types, the schemas are associated with the types, metadata is associated with the objects, and there is at least one relationship between at least two objects selected from the set of objects, where the set of objects and the metadata are extensible, such that extensions provided by a first user are available for use by a second user. In one example, metadata provided by a first user is only available to a second user having a relationship with the first user.

8. 发明申请

US20120259890A1 KNOWLEDGE-BASED DATA MINING SYSTEM 审中-公开
标题翻译：基于知识的数据挖掘系统
公开(公告)号：US20120259890A1
公开(公告)日：2012-10-11
申请号：US13526424
申请日：2012-06-18
申请人： Matthew Denesuk , Daniel Frederick Gruhl , Sridhar Rajagopalan , Andrew S. Tomkins
发明人： Matthew Denesuk , Daniel Frederick Gruhl , Sridhar Rajagopalan , Andrew S. Tomkins
IPC分类号： G06F17/30
CPC分类号： G06F16/951 , G06F2216/03
摘要： In a data mining system, data is gathered into a data store using, e.g., a Web crawler. The data is classified into entities. Data miners use rules to process the entities and append respective keys to the entities representing characteristics of the entities as derived from rules embodied in the miners. With these keys, characteristics of entities as defined by disparate expert authors of the data miners are identified for use in responding to complex data requests from customers.
摘要翻译：在数据挖掘系统中，使用例如Web爬行器将数据收集到数据存储中。数据分为实体。数据挖掘者使用规则来处理实体，并将相应的密钥附加到代表矿工特征的实体的实体。利用这些密钥，确定数据挖掘者的不同专家作者定义的实体的特征用于响应客户的复杂数据请求。

9. 发明授权

US07725346B2 Method and computer program product for predicting sales from online public discussions 有权
标题翻译：用于预测在线公众讨论销售的方法和计算机程序产品
公开(公告)号：US07725346B2
公开(公告)日：2010-05-25
申请号：US11191776
申请日：2005-07-27
申请人： Daniel Frederick Gruhl , Ramanathan Vaidhyanath Guha , Jasmine Novak , Shanmugasundaram Ravikumar , Andrew S. Tomkins
发明人： Daniel Frederick Gruhl , Ramanathan Vaidhyanath Guha , Jasmine Novak , Shanmugasundaram Ravikumar , Andrew S. Tomkins
IPC分类号： G06F17/18
CPC分类号： G06Q30/02 , G06Q30/0202
摘要： A sales prediction system predicts sales from online public discussions. The system utilizes manually or automatically formulated predicates to capture subsets of postings in online public discussions. The system predicts spikes in sales rank based on online chatter. The system comprises automated algorithms that predict spikes in sales rank given a time series of counts of online discussions such as blog postings. The system utilizes a stateless model of customer behavior based on a series of states of excitation that are increasingly likely to lead to a purchase decision. The stateless model of customer behavior yields a predictor of sales rank spikes that is significantly more accurate than conventional techniques operating on sales rank data alone.
摘要翻译：销售预测系统预测在线公众讨论的销售。系统利用手动或自动制定的谓词来捕获在线公开讨论中的帖子。该系统基于在线聊天预测销售额的高峰。该系统包括自动算法，用于预测在线讨论的时间序列（如博客帖子）的销售排名。该系统基于一系列越来越有可能导致购买决定的激励状态，利用客户行为的无状态模型。客户行为的无状态模型产生了销售排名尖峰的预测指标，这比传统的技术仅针对销售排名数据进行操作。

10. 发明授权

US06886129B1 Method and system for trawling the World-wide Web to identify implicitly-defined communities of web pages 失效
标题翻译：拖网世界网络的方法和系统，以识别隐含定义的网页社区
公开(公告)号：US06886129B1
公开(公告)日：2005-04-26
申请号：US09449697
申请日：1999-11-24
申请人： Prabhakar Raghavan , Sridhar Rajagopalan , Shanmugasundaram Ravikumar , Andrew S. Tomkins
发明人： Prabhakar Raghavan , Sridhar Rajagopalan , Shanmugasundaram Ravikumar , Andrew S. Tomkins
IPC分类号： G06F17/30
CPC分类号： G06F17/3071 , G06F17/30873 , Y10S707/99932 , Y10S707/99933
摘要： A method and system for identifying groups of pages of common interest from a collection of hyper-linked pages are disclosed. A plurality of community cores are identified from the collection where each core includes first and second sets of pages, and each page in the first set points to every page in the second set. Each identified core is expanded into a full community which is a subset of the pages regarding a particular topic. The identification community cores is based on the analysis of the Web graph in which the communities correspond to instances of Web subgraphs. Extraneous pages are then pruned to improve the quality of the resulting communities.
摘要翻译：公开了一种用于从超链接页面的集合中识别共同感兴趣的页面组的方法和系统。从集合中识别出多个社区核心，其中每个核心包括第一组和第二组页面，并且第一组中的每个页面指向第二组中的每一页。每个识别的核心都被扩展成一个完整的社区，这是一个关于特定主题的页面的子集。识别社区核心是基于Web图形的分析，其中社区对应于Web子图的实例。然后修剪外来页面以提高所得社区的质量。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式