会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • TEXT SAMPLE ENTRY GROUP FORMULATION
    • 文本采样入组组态
    • WO2015065859A2
    • 2015-05-07
    • PCT/US2014062309
    • 2014-10-27
    • MICROSOFT CORP
    • PETCULESCU CRISTIANDUMITRU MARIUSPARASCHIV VASILENETZ AMIRSANDERS PAUL JONATHON
    • G06F17/30
    • G06F17/30705G06F17/30622G06F17/30657
    • Storing text samples in a manner that the text samples may be quickly searched. The text samples are assigned a text sample identifier and are each parsed to thereby extract text components from the text samples. Text components that have the same content are assigned the same text component identifier. For each parsed text component, a text component entry is created that includes the assigned text component identifier as well as the text sample identifier for the text sample from which the text component was parsed. A text sample entry group is created for each text sample that contains the text component entries in sequence for the text components found within the text sample. The text sample entry groups are stored so as to be scannable during a future search.
    • 以可以快速搜索文本样本的方式存储文本样本。 为文本样本分配一个文本样本标识符,并分别对其进行解析,从而从文本样本中提取文本组件。 具有相同内容的文本组件被分配相同的文本组件标识符。 对于每个已解析的文本组件,将创建一个文本组件条目,其中包含分配的文本组件标识符以及从中分析文本组件的文本样本的文本样本标识符。 为每个文本样本创建文本样本条目组,其中包含文本样本中找到的文本组件的文本组件条目。 存储文本样本条目组,以便在将来搜索期间可扫描。
    • 2. 发明申请
    • EFFICIENT LARGE-SCALE JOINING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES
    • 用于查询基于数据的数据编码结构的有效的大规模加工
    • WO2010039895A3
    • 2010-07-01
    • PCT/US2009059114
    • 2009-09-30
    • MICROSOFT CORP
    • PETCULESCU CRISTIANNETZ AMIR
    • G06F17/30G06F17/00
    • G06F17/3048G06F17/30315G06F17/30498
    • The subject disclosure relates to querying of column based data encoded structures enabling efficient query processing over large scale data storage, and more specifically, with respect to join operations. Initially, a compact structure is received that represents the data according to a column based organization, and various compression and data packing techniques, already enabling a highly efficient and fast query response in real-time. On top of already fast querying enabled by the compact column oriented structure, a scalable, fast algorithm is provided for query processing in memory, which constructs an auxiliary data structure, also column-oriented, for use in join operations, which further leverages characteristics of in-memory data processing and access, as well as the column-oriented characteristics of the compact data structure.
    • 主题公开涉及对基于列的数据编码结构的查询,其能够在大规模数据存储上进行有效的查询处理,更具体地,涉及连接操作。 最初,接收到一个紧凑的结构,其表示根据基于列的组织的数据,以及各种压缩和数据打包技术,其已经实现了高效和快速的查询响应。 在紧凑型列导向结构启用的已经快速查询之上,提供了一种可扩展的快速算法,用于存储器中的查询处理,构建了一个辅助数据结构,也是以列为主,用于连接操作,这进一步利用了 内存数据处理和访问,以及紧凑数据结构的面向列的特性。
    • 3. 发明申请
    • EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE
    • 基于高效数据编码的大规模数据存储
    • WO2010014956A3
    • 2010-06-10
    • PCT/US2009052491
    • 2009-07-31
    • MICROSOFT CORP
    • NETZ AMIRPETCULESCU CRISTIANCRIVAT IOAN BOGDAN
    • G06F7/76G06F7/78
    • G06F17/30501G06F17/30315H03M7/30H03M7/48
    • The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.
    • 本公开涉及基于列的数据编码,其中待压缩的原始数据由列组织,然后作为数据大小的缩减的第一和第二层,字典编码和/或值编码被应用于由 列,以创建与列相对应的整数序列。 接下来,混合贪婪游程长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。 混合数据简化技术与基于列的组织协调一致,加上由于表示紧凑数据而在扫描和查询效率方面的增益,导致数据压缩大大提高了传统系统成本的一小部分。
    • 4. 发明申请
    • EFFICIENT LARGE-SCALE PROCESSING OF COLUMN BASED DATA ENCODED STRUCTURES
    • 基于列的数据编码结构的有效的大规模处理
    • WO2010014955A3
    • 2010-04-22
    • PCT/US2009052490
    • 2009-07-31
    • MICROSOFT CORP
    • NETZ AMIRPETCULESCU CRISTIAN
    • G06F17/30G06F17/00
    • G06F17/30492
    • The subject disclosure relates to efficient query processing over large scale data storage. An exemplary process includes retrieving a subset of columns implicated by a query as integer encoded and compressed sequences of values corresponding to different columns of data, defining query processing buckets that span over the subset of columns based on changes of compression type occurring in the integer encoded and compressed sequences of values of the subset of data and processing the query in memory on a bucket by bucket basis and processing the query based on type of current bucket when processing the integer encoded and compressed sequences of values. The column based organization of the data, and the application of a hybrid run length encoding and bit packing technique, enable a highly efficient and speedy query response in real-time.
    • 本公开涉及对大规模数据存储的有效查询处理。 示例性过程包括:将查询所涉及的列的子集作为对应于不同数据列的整数编码和压缩的值序列,基于在整数编码中出现的压缩类型的变化定义跨越列的子集的查询处理桶 以及数据子集的值的压缩序列,并且逐桶地处理存储器中的查询,并且当处理整数编码和压缩的值序列时,基于当前存储桶的类型来处理查询。 数据的基于列的组织以及混合运行长度编码和位打包技术的应用实现了高效和快速的查询响应。
    • 5. 发明申请
    • EFFICIENT LARGE-SCALE FILTERING AND/OR SORTING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES
    • 有效的大规模过滤和/或分类用于查询基于数据的数据编码结构
    • WO2010039898A2
    • 2010-04-08
    • PCT/US2009059118
    • 2009-09-30
    • MICROSOFT CORP
    • NETZ AMIRPETCULESCU CRISTIANPREDESCU ADRIAN ILCUDUMITRU MARIUS
    • G06F17/00G06F17/30
    • G06F17/30501G06F17/30315G06F17/30448
    • The subject disclosure relates to querying of column based data encoded structures enabling efficient query processing over large scale data storage, and more specifically with respect to complex queries implicating filter and/or sort operations for data over a defined window. In this regard, in various embodiments, a method is provided that avoids scenarios involving expensive sorting of a high percentage of, or all, rows, either by not sorting any rows at all, or by sorting only a very small number of rows consistent with or smaller than a number of rows associated with the size of the requested window over the data. In one embodiment, this is achieved by splitting an external query request into two different internal sub-requests, a first one that computes statistics about distribution of rows for any specified WHERE clauses and ORDER BY columns, and a second one that selects only the rows that match the window based on the statistics.
    • 主题公开涉及查询基于列的数据编码结构,其能够在大规模数据存储上进行有效的查询处理,更具体地涉及涉及在定义的窗口上涉及数据的过滤器和/或排序操作的复杂查询。 在这方面,在各种实施例中,提供了一种方法,其避免了通过不对任何行进行排序的方式来避免高百分比或全部行的昂贵排序的情况,或者仅通过仅排列非常小数量的与 或小于与数据上所请求的窗口大小相关联的行数。 在一个实施例中,这通过将外部查询请求分割成两个不同的内部子请求来实现,第一个是根据任何指定的WHERE子句和ORDER BY列计算关于行的分布的统计信息,第二个只选择行 根据统计信息匹配窗口。