会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 23. 发明授权
    • Data compression of large scale data stored in sparse tables
    • 大量数据的数据压缩存储在稀疏表中
    • US07548928B1
    • 2009-06-16
    • US11197922
    • 2005-08-05
    • Jeffrey A. DeanSanjay Ghemawat
    • Jeffrey A. DeanSanjay Ghemawat
    • G06F17/30
    • G06F17/30985H03M7/3084Y10S707/99942Y10S707/99945
    • A method of compressing data in a table data structure begins by accessing a data set within the table data structure, the data set having associated therewith a range of rows of the table data structure. Data items in the data set are represented by key-value pairs. The method includes applying a first compression to the values of the key-value pairs in the data set to produce a first compressed output; applying a second compression, distinct from the first compression, to the keys of the key-value pairs in the data set to produce a second compressed output; and applying a third compression to the first compressed output and second compressed output to produce a first compressed output block, wherein the third compression is distinct from the first compression and second compression.
    • 一种压缩表格数据结构中的数据的方法是通过访问表格数据结构内的数据集开始的,该数据集与表数据结构中的一行行相关联。 数据集中的数据项由键值对表示。 该方法包括对数据集中的键值对的值应用第一压缩以产生第一压缩输出; 将不同于第一压缩的第二压缩应用于数据集中的键值对的键以产生第二压缩输出; 以及将第三压缩应用于所述第一压缩输出和所述第二压缩输出以产生第一压缩输出块,其中所述第三压缩与所述第一压缩和所述第二压缩不同。
    • 25. 发明申请
    • Efficient Indexing of Documents with Similar Content
    • 具有相似内容的文件的高效索引
    • US20120303622A1
    • 2012-11-29
    • US13571316
    • 2012-08-09
    • Jeffrey A. DeanSanjay GhemawatGautham Thambidorai
    • Jeffrey A. DeanSanjay GhemawatGautham Thambidorai
    • G06F17/30
    • G06F17/3071
    • A computer system comprising one or more processors and memory groups a set of documents into a plurality of clusters. Each cluster includes one or more documents of the set of documents and a respective cluster of documents of the plurality of clusters includes respective cluster data corresponding to a plurality of documents including a first document and a second document. The computer system determines that the second document includes duplicate data that is duplicative of corresponding data in the first document, identifies a respective subset of the respective cluster data that excludes at least a subset of the duplicate data, and generates an index of the respective subset of the respective cluster data.
    • 一种包括一个或多个处理器和存储器组的计算机系统,一组文档成为多个集群。 每个集群包括文档集合中的一个或多个文档,并且多个集群的相应文档集合包括对应于包括第一文档和第二文档的多个文档的相应集群数据。 计算机系统确定第二文档包括与第一文档中的对应数据重复的重复数据,识别排除重复数据的至少一个子集的相应集群数据的相应子集,并且生成相应子集的索引 的各个集群数据。