会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 10. 发明申请
    • Primitive operator for similarity joins in data cleaning
    • 数据清理中相似性的原始运算符
    • US20070192342A1
    • 2007-08-16
    • US11352141
    • 2006-02-10
    • Kaushik ShriraghavSurajit ChaudhuriVenkatesh Ganti
    • Kaushik ShriraghavSurajit ChaudhuriVenkatesh Ganti
    • G06F7/00
    • G06F17/30442Y10S707/99942Y10S707/99943
    • A set similarity join system and method are provided. The system can be employed to facilitate data cleaning based on similarities through the identification of “close” tuples (e.g., records and/or rows). “Closeness” can be is evaluated using a similarity function(s) chosen to suit the domain and/or application. Thus, the system facilitates generic domain-independent data cleansing. The system can be employed with a foundational primitive, the set similarity join (SSJoin) operator, which can be used as a building block to implement a broad variety of notions of similarity (e.g., edit similarity, Jaccard similarity, generalized edit similarity, hamming distance, soundex, etc.) as well as similarity based on co-occurrences. The SSJoin operator can exploit the observation that set overlap can be used effectively to support a variety of similarity functions. The SSJoin operator compares values based on “sets” associated with (or explicitly constructed for) each one of them.
    • 提供了一种集合相似性连接系统和方法。 可以通过识别“关闭”元组(例如,记录和/或行)来基于相似性来促进系统的数据清理。 可以使用选择适合域和/或应用程序的相似性函数来评估“接近度”。 因此,该系统便于通用的域无关数据清理。 该系统可以与基本原语,即相似性连接(SSJoin)运算符一起使用,其可以用作构建块来实现各种各样的相似性概念(例如,编辑相似性,Jaccard相似性,广义编辑相似性,汉明 距离,声音等)以及基于共同出现的相似性。 SSJoin算子可以利用设置重叠的观察结果有效地用于支持各种相似度函数。 SSJoin操作符根据与其中每一个相关联(或明确构建的)的“集合”来比较值。