会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Method and framework to support indexing and searching taxonomies in large scale full text indexes
    • 支持大规模全文索引分类和搜索索引的方法和框架
    • US08600997B2
    • 2013-12-03
    • US11241687
    • 2005-09-30
    • Nadav EironDaniel N. MeredithJoerg MeyerJan H. PieperAndrew S. Tomkins
    • Nadav EironDaniel N. MeredithJoerg MeyerJan H. PieperAndrew S. Tomkins
    • G06F7/00G06F17/30
    • G06F17/30734
    • A system and method of indexing a plurality of entities located in a taxonomy, the entities comprising sets of terms, comprises receiving terms in an index structure; building a posting list for an entity with respect to the locations of the set of terms defining the entity and data associated with the respective terms; and indexing a name of a group comprising the entities within this group at the location of the entities with the data of the group comprising the name of the respective entity at each location. The building of the posting list comprises storing the location of the term and data associated with the term in an entry in the posting list for the term. The method comprises indexing aliases of the name of the group comprising the term, and using an inverted list index to associate data with each occurrence of an index term.
    • 一种对位于分类法中的多个实体进行索引的系统和方法,所述实体包括术语集合,包括在索引结构中接收术语; 为一个实体建立关于定义与各个条款相关联的实体和数据的术语集的位置的实体的发布列表; 并且在包括在每个位置处的相应实体的名称的组的数据的实体的位置处索引包括在该组内的实体的组的名称。 发布列表的构建包括将术语的位置和与该术语相关联的数据存储在该术语的发布列表中的条目中。 该方法包括对包括该术语的组的名称的别名进行索引,并使用反向列表索引将数据与索引项的每次出现相关联。
    • 3. 发明授权
    • Microhubs and its applications
    • Microhubs及其应用
    • US08041705B2
    • 2011-10-18
    • US12348336
    • 2009-01-05
    • Srinivasan BalasubramanianMichael ChingPiyoosh JalanSatish C. PenmetsaAndrew S. Tomkins
    • Srinivasan BalasubramanianMichael ChingPiyoosh JalanSatish C. PenmetsaAndrew S. Tomkins
    • G06F17/30
    • G06F17/30864Y10S707/99932Y10S707/99937
    • A system and method of crawling at least one website comprising at least one URL includes maintaining a lookup structure comprising all of the URLs known to be on a website; calculating a hub score for each webpage of the website to be recrawled, wherein the hub score measures how likely the to be recrawled webpage includes links to fresh content published on the website; sorting all the to be recrawled pages by their hub scores; and crawling the to be recrawled pages in order from highest hub scores to lowest hub scores. The calculating comprises computing a first value equaling a percentage of a number of new relative URLs on the to be recrawled page; computing a second value equaling a percentage of a previous hub score of the to be recrawled page; and computing the hub score as a sum of the first and the second values.
    • 一种爬行包括至少一个URL的至少一个网站的系统和方法包括维护包括已知在网站上的所有URL的查找结构; 计算要重新抓取的网站的每个网页的中心评分,其中中心评分测量重新获取的网页的可能性包括链接到在网站上发布的新鲜内容; 通过他们的中心分数排序所有要重新抓取的页面; 并从最高中心分数到最低中心分数的顺序爬行重新抓取的页面。 计算包括计算等于要重新获取的页面上的多个新的相对URL的百分比的第一值; 计算等于要重新抓取的页面的先前中心点的百分比的第二值; 以及将所述中心分数计算为所述第一和第二值的总和。
    • 4. 发明授权
    • Microhubs and its applications
    • Microhubs及其应用
    • US07496557B2
    • 2009-02-24
    • US11241469
    • 2005-09-30
    • Srinivasan BalasubramanianMichael ChingPiyoosh JalanSatish C. PenmetsaAndrew S. Tomkins
    • Srinivasan BalasubramanianMichael ChingPiyoosh JalanSatish C. PenmetsaAndrew S. Tomkins
    • G06F13/30
    • G06F17/30864Y10S707/99932Y10S707/99937
    • A system and method of crawling at least one website comprising at least one URL includes maintaining a lookup structure comprising all of the URLs known to be on a website; calculating a hub score for each webpage of the website to be recrawled, wherein the hub score measures how likely the to be recrawled webpage includes links to fresh content published on the website; sorting all the to be recrawled pages by their hub scores; and crawling the to be recrawled pages in order from highest hub scores to lowest hub scores. The calculating comprises computing a first value equaling a percentage of a number of new relative URLs on the to be recrawled page; computing a second value equaling a percentage of a previous hub score of the to be recrawled page; and computing the hub score as a sum of the first and the second values.
    • 一种爬行包括至少一个URL的至少一个网站的系统和方法包括维护包括已知在网站上的所有URL的查找结构; 计算要重新抓取的网站的每个网页的中心评分,其中中心评分测量重新获取的网页的可能性包括链接到在网站上发布的新鲜内容; 通过他们的中心分数排序所有要重新抓取的页面; 并从最高中心分数到最低中心分数的顺序爬行重新抓取的页面。 计算包括计算等于要重新获取的页面上的多个新的相对URL的百分比的第一值; 计算等于要重新抓取的页面的先前中心点的百分比的第二值; 以及将所述中心分数计算为所述第一和第二值的总和。
    • 5. 发明授权
    • Method for automatically extracting by-line information
    • 自动提取离线信息的方法
    • US07464078B2
    • 2008-12-09
    • US11259608
    • 2005-10-25
    • Stephen DillMadhukar R. KorupoluAndrew S. Tomkins
    • Stephen DillMadhukar R. KorupoluAndrew S. Tomkins
    • G06F17/30
    • G06F17/30719Y10S707/99932Y10S707/99933
    • A by-line extraction method detects a set of potential headlines from a title meta-tag of a crawled document, selects a candidate headline from the set of potential headlines, and extracts the by-line information from the document using the location of the selected candidate headline. The method constructs the set of potential headlines based on the title meta-tag. The method selects a candidate headline by evaluating the set of potential headlines in order of the lengths of the potential headlines. The method extracts the by-line information from the document by using the location of the selected candidate headline to extract a string representing a date, a name, or a source located within a minimum distance from the location of the potential headline.
    • 逐行提取方法从爬行文档的标题元标签中检测潜在的标题集合,从潜在标题集合中选择候选标题,并使用所选择的位置从文档中提取副线信息 候选人标题。 该方法基于标题元标签构建潜在标题集。 该方法通过以潜在标题的长度的顺序评估潜在标题集来选择候选标题。 该方法通过使用所选择的候选标题的位置来提取来自文档的旁路信息,以提取表示距离潜在标题的位置的最小距离内的日期,名称或源的字符串。
    • 6. 发明授权
    • System, method, and service for segmenting a topic into chatter and subtopics
    • 系统,方法和服务,将主题分割成喋喋不休和副主题
    • US07281022B2
    • 2007-10-09
    • US10847084
    • 2004-05-15
    • Daniel Frederick GruhlRamanathan Vaidhyanath GuhaAndrew S. Tomkins
    • Daniel Frederick GruhlRamanathan Vaidhyanath GuhaAndrew S. Tomkins
    • G06F17/00
    • G06F17/30705Y10S707/99945Y10S707/99948
    • A topic segmenting system segments a topic into chatter and subtopics. The system decomposes a conversation into topics, producing a time-based structure for topics and subtopics in the conversation. The system extracts a large number of topics at all levels of granularity. Some of the topics extracted correspond to broad topics and some correspond to “spiky” topics or subtopics. The system comprises a process for automatically detecting spiky regions of a topic. For each possible broad topic, the present system finds regions where coverage of the broad topic overlaps significantly with the spiky region of another topic. The system then removes the spiky subtopic from the conversation. Processing is repeated until all discernable topics have been identified and removed from the conversation, yielding random topics of little duration or intensity.
    • 主题分段系统将主题分为喋喋不休和副主题。 系统将对话分解为主题,为会话中的主题和子主题生成基于时间的结构。 系统以各种粒度级别提取大量的主题。 提取的一些主题对应于广泛的主题,一些对应于“尖锐”主题或副主题。 该系统包括用于自动检测主题的尖峰区域的过程。 对于每个可能的广泛话题,本系统查找广泛主题的覆盖面与另一主题的尖锐区域重叠的区域。 系统然后从会话中删除尖锐的子主题。 重复处理,直到所有可辨别的主题已被识别并从会话中删除,产生了持续时间或强度很小的随机话题。