会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 5. 发明授权
    • Efficient column based data encoding for large-scale data storage
    • 高效的基于列的数据编码用于大规模数据存储
    • US08452737B2
    • 2013-05-28
    • US13347367
    • 2012-01-10
    • Amir NetzCristian PetculescuIoan Bogdan Crivat
    • Amir NetzCristian PetculescuIoan Bogdan Crivat
    • G06F17/30
    • G06F17/30501G06F17/30315H03M7/30H03M7/48
    • The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.
    • 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的第一和第二层缩减,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪跑步长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。
    • 7. 发明申请
    • PROCESSING RECORDS IN DYNAMIC RANGES
    • 在动态范围内处理记录
    • US20120271845A1
    • 2012-10-25
    • US13092978
    • 2011-04-25
    • Amir NetzCristian Petculescu
    • Amir NetzCristian Petculescu
    • G06F17/30
    • G06F17/30454G06F17/30412
    • A scalable analysis system is described herein that performs common data analysis operations such as distinct counts and data grouping in a more scalable and efficient manner. The system allows distinct counts and data grouping to be applied to large datasets with predictable growth in the cost of the operation. The system dynamically partitions data based on the actual data distribution, which provides both scalability and uncompromised performance. The system sets a budget of available memory or other resources to use for the operation. As the operation progresses, the system determines whether the budget of memory is nearing exhaustion. Upon detecting that the memory used is near the limit, the system dynamically partitions the data. If the system still detects memory pressure, then the system partitions again, until a partition level is identified that fits within the memory budget.
    • 本文描述了可扩展分析系统,其以更可扩展和有效的方式执行诸如不同计数和数据分组之类的共同数据分析操作。 该系统允许将不同的计数和数据分组应用于具有可预测的操作成本增长的大型数据集。 系统根据实际的数据分布动态分割数据,提供了可扩展性和无与伦比的性能。 系统设置可用内存或其他资源的预算用于操作。 随着操作的进行,系统确定存储器的预算是否接近耗尽。 在检测到所使用的内存接近限制时,系统会动态分区数据。 如果系统仍然检测到内存压力,则系统再次分区,直到识别出符合内存预算的分区级别。
    • 8. 发明授权
    • Random access in run-length encoded structures
    • 游程编码结构中的随机访问
    • US07952499B1
    • 2011-05-31
    • US12696226
    • 2010-01-29
    • Bogdan CrivatCristian PetculescuAmir Netz
    • Bogdan CrivatCristian PetculescuAmir Netz
    • H03M7/46
    • H03M7/46
    • Random access to run-length encoded data values is provided. A target value is identified by a logical index into a structure of run-length-encoded values. To access the value, a bookmark is selected based on the logical index, on a maximum logical index of the bookmark, and on a specified bookmark distance. An initial run in the structure is located, based on the selected bookmark. A final run is chosen, at most one bookmark distance from the initial run. The target value is the value of the final run. Efficiency heuristics are used when generating bookmarks or creating the structure of run-length-encoded values.
    • 提供对游程长度编码数据值的随机访问。 目标值由逻辑索引识别为运行长度编码值的结构。 要访问该值,将基于逻辑索引,书签的最大逻辑索引以及指定的书签距离来选择书签。 根据所选书签,定位在结构中的初始运行。 选择最后一个运行,距离初始运行最多一个书签距离。 目标值是最终运行的值。 当生成书签或创建运行长度编码值的结构时,使用效率启发式方法。