会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 122. 发明授权
    • Robust discovery of entity synonyms using query logs
    • 使用查询日志强大发现实体同义词
    • US08745019B2
    • 2014-06-03
    • US13487260
    • 2012-06-04
    • Tao ChengKaushik ChakrabartiSurajit ChaudhuriDong Xin
    • Tao ChengKaushik ChakrabartiSurajit ChaudhuriDong Xin
    • G06F17/30
    • G06F17/30672
    • A similarity analysis framework is described herein which leverages two or more similarity analysis functions to generate synonyms for an entity reference string re. The functions are selected such that the synonyms that are generated by the framework satisfy a core set of synonym-related properties. The functions operate by leveraging query log data. One similarity analysis function takes into consideration the strength of similarity between a particular candidate string se and an entity reference string re even in the presence of sparse query log data, while another function takes into account the classes of se and re. The framework also provides indexing mechanisms that expedite its computations. The framework also provides a reduction module for converting long entity reference strings into shorter strings, where each shorter string (if found) contains a subset of the terms in its longer counterpart.
    • 本文描述了相似性分析框架,其利用两个或多个相似性分析功能来生成实体参考字符串re的同义词。 选择这些功能使得由框架生成的同义词满足同义词相关属性的核心集合。 这些功能通过利用查询日志数据进行操作。 一个相似性分析功能考虑到即使在存在稀疏查询日志数据的情况下,特定候选字符串se和实体引用字符串之间的相似度的强度,而另一个函数考虑了se和re的类别。 该框架还提供了加速其计算的索引机制。 该框架还提供了一个缩减模块,用于将长实体引用字符串转换为较短的字符串,其中每个较短的字符串(如果找到)包含其较长对应项中的术语的子集。
    • 123. 发明授权
    • Taxonomy editor
    • 分类编辑器
    • US08527893B2
    • 2013-09-03
    • US12713190
    • 2010-02-26
    • Sanjay AgrawalSurajit ChaudhuriVenkatesh GantiYuri Siradeghyan
    • Sanjay AgrawalSurajit ChaudhuriVenkatesh GantiYuri Siradeghyan
    • G06F3/048
    • G06F17/30734
    • This patent application relates to taxonomy editing. One implementation involves a taxonomy editor configured to generate a visual representation of a taxonomy associated with a set of scientific papers. The taxonomy editor includes a properties module configured to identify properties relating to an individual node of the taxonomy and a statistics module configured to determine trends relating to the individual node. The taxonomy editor further includes a similarity module configured to evaluate keyword similarity relative to individual scientific papers associated with the individual node. The taxonomy editor also includes a suggestion module configured to utilize the properties, the trends and the keyword similarity to identify potential modifications to the taxonomy. The taxonomy editor is further configured to present at least some of the potential modifications, the properties, the trends, and the keyword similarity concurrently with the visual representation of the taxonomy.
    • 该专利申请涉及分类编辑。 一个实现涉及分类编辑器,其被配置为生成与一组科学论文相关联的分类法的视觉表示。 分类编辑器包括被配置为识别与分类法的单个节点相关的属性的属性模块,以及被配置为确定与各个节点相关的趋势的统计模块。 分类编辑器还包括相似度模块,其被配置为评估与单个节点相关联的各个科学论文的关键字相似度。 分类编辑器还包括配置为利用属性,趋势和关键字相似性的建议模块来识别对分类法的潜在修改。 分类编辑器还被配置为与分类法的视觉表示同时呈现至少一些潜在的修改,属性,趋势和关键词相似性。
    • 124. 发明申请
    • PROGRESSIVE SPATIAL SEARCHING USING AUGMENTED STRUCTURES
    • 使用增强结构进行空间搜索
    • US20120173500A1
    • 2012-07-05
    • US12981082
    • 2010-12-29
    • Kaushik ChakrabartiSurajit Chaudhuri
    • Kaushik ChakrabartiSurajit Chaudhuri
    • G06F17/30
    • G06F17/3053G01C21/3611G01C21/3679G06F17/30091G06F17/30241G06F17/30864
    • A location associated with a user of a computing device and a prefix portion of an input string may be received as one or more successive characters of the input string are provided by the user via the computing device. A list of suggested items may be obtained based on a function of respective recommendation indicators and proximities of the items to the location in response to receiving the prefix portion, and based on partially traversing a character string search structure having a plurality of non-terminal nodes augmented with bound indicators associated with spatial regions. The list of suggested items and descriptive information associated with each suggested item may be returned to the user, in response to receiving the prefix portion, for rendering an image illustrating indicators associated with the list in a manner relative to the location, as the user provides each successive character of the input string.
    • 当用户通过计算设备提供输入串的一个或多个连续字符时,可以接收与计算设备的用户和输入字符串的前缀部分相关联的位置。 可以基于各个推荐指标的功能和响应于接收前缀部分的到位置的项目的接近度,并且基于部分地遍历具有多个非终端节点的字符串搜索结构来获得所提出的项目的列表 增加与空间区域相关联的绑定指标。 与每个建议项目相关联的建议项目和描述性信息的列表可以响应于接收到前缀部分而返回给用户,用于以用户提供的方式呈现以相对于位置的方式示出与列表相关联的指示符的图像 输入字符串的每个连续字符。
    • 125. 发明授权
    • Leveraging constraints for deduplication
    • 利用重复数据删除的约束
    • US08204866B2
    • 2012-06-19
    • US11804400
    • 2007-05-18
    • Surajit ChaudhuriVenkatesh GantiShriraghav KaushikAnish Das Sarma
    • Surajit ChaudhuriVenkatesh GantiShriraghav KaushikAnish Das Sarma
    • G06F17/30
    • G06F17/30489
    • A deduplication algorithm that provides improved accuracy in data deduplication by using aggregate and/or groupwise constraints. Deduplication is accomplished using only as many of these constraints that are satisfied rather than be imposed inflexibly as hard constraints. Additionally, textual similarity between tuples is leveraged to restrict the search space. The algorithm begins with a coarse initial partition of data records and continues by raising the similarity threshold until the threshold splits a given partition. This sequence of splits defines a rich space of alternatives. Over this space, an algorithm finds a partition of the input that maximizes constraint satisfaction. In the context of groupwise aggregation constraints for deduplication all SQL (structured query language) aggregates are allowed, including summation.
    • 重复数据删除算法,通过使用聚合和/或分组约束来提高重复数据删除的精度。 重复数据删除使用只有这些约束满足的约束才能实现,而不是将其作为硬约束条件强制强加。 此外,利用元组之间的文本相似性来限制搜索空间。 该算法以数据记录的粗略初始分区开始,并通过提高相似性阈值继续,直到阈值分裂给定分区。 这个拆分序列定义了丰富的替代空间。 在这个空间上,一个算法找到了一个最大化约束满足度的输入分区。 在重复数据消除的分组聚合约束的上下文中,允许所有SQL(结构化查询语言)聚合,包括求和。
    • 126. 发明授权
    • Example-driven design of efficient record matching queries
    • 高效记录匹配查询的示例驱动设计
    • US08046339B2
    • 2011-10-25
    • US11758202
    • 2007-06-05
    • Surajit ChaudhuriBee Chung ChenVenkatesh GantiShriraghav Kaushik
    • Surajit ChaudhuriBee Chung ChenVenkatesh GantiShriraghav Kaushik
    • G06F17/30
    • G06F17/30533G06F17/30495
    • Example-driven creation of record matching queries. The disclosed architecture employs techniques that exploit the availability of positive (or matching) and negative (non-matching) examples to search through this space and suggest an initial record matching query. The record matching task is modeled as that of designing an operator tree obtained by composing a few primitive operators. This ensures that record matching programs be executable efficiently and scalably over large input relations. The architecture joins records across multiple (e.g., two) relations (e.g., R and S). The architecture exploits the monotonicity property of similarity functions for record matching in the relations, in that, any pair of matching records have a higher similarity value than non-matching record pairs on at least one similarity function.
    • 示例驱动创建记录匹配查询。 所公开的架构采用利用正(或匹配)和否定(不匹配)示例的可用性来搜索该空间并提出初始记录匹配查询的技术。 记录匹配任务被建模为设计通过组合几个原始算子获得的运算符树的记录匹配任务。 这确保了记录匹配程序可以在大的输入关系上有效和可扩展地执行。 该架构通过多个(例如,两个)关系(例如,R和S)连接记录。 该架构利用了关系中记录匹配的相似度函数的单调性,因为任何一对匹配记录具有比至少一个相似度函数上的非匹配记录对更高的相似度值。
    • 127. 发明授权
    • System and method for searching computer files and returning identified files and associated files
    • 用于搜索计算机文件并返回识别的文件和相关文件的系统和方法
    • US07930301B2
    • 2011-04-19
    • US10403063
    • 2003-03-31
    • Cezary MarcjanRyszard KottSurajit ChaudhuriLili Cheng
    • Cezary MarcjanRyszard KottSurajit ChaudhuriLili Cheng
    • G06F7/00
    • G06F17/30106
    • A search of an index database or another search method is conducted to identify preliminary results listing one or more selected computer objects having selected identifying information stored in an index database. In addition, one or more selected computer objects of the preliminary search results are correlated with one or more other computer objects that have associations with the selected computer objects of the preliminary search results. Integrated search results are then returned and include the preliminary search results and one or more other computer objects that have associations with the selected computer objects of the preliminary search results. The associations may be determined by a association system and represent relationships between computer files based upon user or other interactions between the objects. The associations between the objects may include similarities between them and their importance.
    • 执行索引数据库或另一搜索方法的搜索以识别列出存储在索引数据库中的具有所选择的识别信息的一个或多个所选计算机对象的初步结果。 此外,初步搜索结果的一个或多个所选择的计算机对象与与初步搜索结果的所选择的计算机对象具有关联的一个或多个其他计算机对象相关联。 然后返回集成搜索结果,并包括初步搜索结果以及与初步搜索结果的所选计算机对象相关联的一个或多个其他计算机对象。 关联可以由关联系统确定,并且基于用户或对象之间的其他交互来表示计算机文件之间的关系。 对象之间的关联可能包括它们之间的相似性及其重要性。
    • 130. 发明授权
    • Detecting duplicate records in databases
    • 检测数据库中的重复记录
    • US07685090B2
    • 2010-03-23
    • US11182590
    • 2005-07-14
    • Surajit ChaudhuriVenkatesh GantiRohit Ananthakrishna
    • Surajit ChaudhuriVenkatesh GantiRohit Ananthakrishna
    • G06F17/30
    • G06F17/30303Y10S707/99931Y10S707/99942
    • The invention concerns a detection of duplicate tuples in a database. Previous domain independent detection of duplicated tuples relied on standard similarity functions (e.g., edit distance, cosine metric) between multi-attribute tuples. However, such prior art approaches result in large numbers of false positives if they are used to identify domain-specific abbreviations and conventions. In accordance with the invention a process for duplicate detection is implemented based on interpreting records from multiple dimensional tables in a data warehouse, which are associated with hierarchies specified through key—foreign key relationships in a snowflake schema. The invention exploits the extra knowledge available from the table hierarchy to develop a high quality, scalable duplicate detection process.
    • 本发明涉及对数据库中的重复元组的检测。 复制元组的先前的域独立检测依赖于多属性元组之间的标准相似度函数(例如,编辑距离,余弦度量)。 然而,如果这些现有技术的方法用于识别领域特定的缩写和惯例,则会产生大量的假阳性。 根据本发明,基于解释数据仓库中来自多个维度表的记录来实现重复检测的过程,数据仓库与通过雪花模式中的关键 - 外键关系指定的层次相关联。 本发明利用表层次结构中可用的额外知识来开发高质量,可扩展的重复检测过程。