专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

WO2015065859A2 TEXT SAMPLE ENTRY GROUP FORMULATION 审中-公开
标题翻译：文本采样入组组态
公开(公告)号：WO2015065859A2
公开(公告)日：2015-05-07
申请号：PCT/US2014062309
申请日：2014-10-27
申请人： MICROSOFT CORP
发明人： PETCULESCU CRISTIAN , DUMITRU MARIUS , PARASCHIV VASILE , NETZ AMIR , SANDERS PAUL JONATHON
IPC分类号： G06F17/30
CPC分类号： G06F17/30705 , G06F17/30622 , G06F17/30657
摘要： Storing text samples in a manner that the text samples may be quickly searched. The text samples are assigned a text sample identifier and are each parsed to thereby extract text components from the text samples. Text components that have the same content are assigned the same text component identifier. For each parsed text component, a text component entry is created that includes the assigned text component identifier as well as the text sample identifier for the text sample from which the text component was parsed. A text sample entry group is created for each text sample that contains the text component entries in sequence for the text components found within the text sample. The text sample entry groups are stored so as to be scannable during a future search.
摘要翻译：以可以快速搜索文本样本的方式存储文本样本。为文本样本分配一个文本样本标识符，并分别对其进行解析，从而从文本样本中提取文本组件。具有相同内容的文本组件被分配相同的文本组件标识符。对于每个已解析的文本组件，将创建一个文本组件条目，其中包含分配的文本组件标识符以及从中分析文本组件的文本样本的文本样本标识符。为每个文本样本创建文本样本条目组，其中包含文本样本中找到的文本组件的文本组件条目。存储文本样本条目组，以便在将来搜索期间可扫描。

2. 发明申请

WO2010039895A3 EFFICIENT LARGE-SCALE JOINING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES 审中-公开
标题翻译：用于查询基于数据的数据编码结构的有效的大规模加工
公开(公告)号：WO2010039895A3
公开(公告)日：2010-07-01
申请号：PCT/US2009059114
申请日：2009-09-30
申请人： MICROSOFT CORP
发明人： PETCULESCU CRISTIAN , NETZ AMIR
IPC分类号： G06F17/30 , G06F17/00
CPC分类号： G06F17/3048 , G06F17/30315 , G06F17/30498
摘要： The subject disclosure relates to querying of column based data encoded structures enabling efficient query processing over large scale data storage, and more specifically, with respect to join operations. Initially, a compact structure is received that represents the data according to a column based organization, and various compression and data packing techniques, already enabling a highly efficient and fast query response in real-time. On top of already fast querying enabled by the compact column oriented structure, a scalable, fast algorithm is provided for query processing in memory, which constructs an auxiliary data structure, also column-oriented, for use in join operations, which further leverages characteristics of in-memory data processing and access, as well as the column-oriented characteristics of the compact data structure.
摘要翻译：主题公开涉及对基于列的数据编码结构的查询，其能够在大规模数据存储上进行有效的查询处理，更具体地，涉及连接操作。最初，接收到一个紧凑的结构，其表示根据基于列的组织的数据，以及各种压缩和数据打包技术，其已经实现了高效和快速的查询响应。在紧凑型列导向结构启用的已经快速查询之上，提供了一种可扩展的快速算法，用于存储器中的查询处理，构建了一个辅助数据结构，也是以列为主，用于连接操作，这进一步利用了内存数据处理和访问，以及紧凑数据结构的面向列的特性。

3. 发明申请

WO2010014956A3 EFFICIENT COLUMN BASED DATA ENCODING FOR LARGE-SCALE DATA STORAGE 审中-公开
标题翻译：基于高效数据编码的大规模数据存储
公开(公告)号：WO2010014956A3
公开(公告)日：2010-06-10
申请号：PCT/US2009052491
申请日：2009-07-31
申请人： MICROSOFT CORP
发明人： NETZ AMIR , PETCULESCU CRISTIAN , CRIVAT IOAN BOGDAN
IPC分类号： G06F7/76 , G06F7/78
CPC分类号： G06F17/30501 , G06F17/30315 , H03M7/30 , H03M7/48
摘要： The subject disclosure relates to column based data encoding where raw data to be compressed is organized by columns, and then, as first and second layers of reduction of the data size, dictionary encoding and/or value encoding are applied to the data as organized by columns, to create integer sequences that correspond to the columns. Next, a hybrid greedy run length encoding and bit packing compression algorithm further compacts the data according to an analysis of bit savings. Synergy of the hybrid data reduction techniques in concert with the column-based organization, coupled with gains in scanning and querying efficiency owing to the representation of the compact data, results in substantially improved data compression at a fraction of the cost of conventional systems.
摘要翻译：本公开涉及基于列的数据编码，其中待压缩的原始数据由列组织，然后作为数据大小的缩减的第一和第二层，字典编码和/或值编码被应用于由列，以创建与列相对应的整数序列。接下来，混合贪婪游程长度编码和位打包压缩算法根据比特节省的分析进一步压缩数据。混合数据简化技术与基于列的组织协调一致，加上由于表示紧凑数据而在扫描和查询效率方面的增益，导致数据压缩大大提高了传统系统成本的一小部分。

4. 发明申请

WO2010014955A3 EFFICIENT LARGE-SCALE PROCESSING OF COLUMN BASED DATA ENCODED STRUCTURES 审中-公开
标题翻译：基于列的数据编码结构的有效的大规模处理
公开(公告)号：WO2010014955A3
公开(公告)日：2010-04-22
申请号：PCT/US2009052490
申请日：2009-07-31
申请人： MICROSOFT CORP
发明人： NETZ AMIR , PETCULESCU CRISTIAN
IPC分类号： G06F17/30 , G06F17/00
CPC分类号： G06F17/30492
摘要： The subject disclosure relates to efficient query processing over large scale data storage. An exemplary process includes retrieving a subset of columns implicated by a query as integer encoded and compressed sequences of values corresponding to different columns of data, defining query processing buckets that span over the subset of columns based on changes of compression type occurring in the integer encoded and compressed sequences of values of the subset of data and processing the query in memory on a bucket by bucket basis and processing the query based on type of current bucket when processing the integer encoded and compressed sequences of values. The column based organization of the data, and the application of a hybrid run length encoding and bit packing technique, enable a highly efficient and speedy query response in real-time.
摘要翻译：本公开涉及对大规模数据存储的有效查询处理。示例性过程包括：将查询所涉及的列的子集作为对应于不同数据列的整数编码和压缩的值序列，基于在整数编码中出现的压缩类型的变化定义跨越列的子集的查询处理桶以及数据子集的值的压缩序列，并且逐桶地处理存储器中的查询，并且当处理整数编码和压缩的值序列时，基于当前存储桶的类型来处理查询。数据的基于列的组织以及混合运行长度编码和位打包技术的应用实现了高效和快速的查询响应。

5. 发明申请

WO2010039898A2 EFFICIENT LARGE-SCALE FILTERING AND/OR SORTING FOR QUERYING OF COLUMN BASED DATA ENCODED STRUCTURES 审中-公开
标题翻译：有效的大规模过滤和/或分类用于查询基于数据的数据编码结构
公开(公告)号：WO2010039898A2
公开(公告)日：2010-04-08
申请号：PCT/US2009059118
申请日：2009-09-30
申请人： MICROSOFT CORP
发明人： NETZ AMIR , PETCULESCU CRISTIAN , PREDESCU ADRIAN ILCU , DUMITRU MARIUS
IPC分类号： G06F17/00 , G06F17/30
CPC分类号： G06F17/30501 , G06F17/30315 , G06F17/30448
摘要： The subject disclosure relates to querying of column based data encoded structures enabling efficient query processing over large scale data storage, and more specifically with respect to complex queries implicating filter and/or sort operations for data over a defined window. In this regard, in various embodiments, a method is provided that avoids scenarios involving expensive sorting of a high percentage of, or all, rows, either by not sorting any rows at all, or by sorting only a very small number of rows consistent with or smaller than a number of rows associated with the size of the requested window over the data. In one embodiment, this is achieved by splitting an external query request into two different internal sub-requests, a first one that computes statistics about distribution of rows for any specified WHERE clauses and ORDER BY columns, and a second one that selects only the rows that match the window based on the statistics.
摘要翻译：主题公开涉及查询基于列的数据编码结构，其能够在大规模数据存储上进行有效的查询处理，更具体地涉及涉及在定义的窗口上涉及数据的过滤器和/或排序操作的复杂查询。在这方面，在各种实施例中，提供了一种方法，其避免了通过不对任何行进行排序的方式来避免高百分比或全部行的昂贵排序的情况，或者仅通过仅排列非常小数量的与或小于与数据上所请求的窗口大小相关联的行数。在一个实施例中，这通过将外部查询请求分割成两个不同的内部子请求来实现，第一个是根据任何指定的WHERE子句和ORDER BY列计算关于行的分布的统计信息，第二个只选择行根据统计信息匹配窗口。

6. 发明申请

WO2009006028A3 EXPLAINING CHANGES IN MEASURES THRU DATA MINING 审中-公开
标题翻译：解读数据挖掘的措施变化
公开(公告)号：WO2009006028A3
公开(公告)日：2009-03-05
申请号：PCT/US2008067363
申请日：2008-06-18
申请人： MICROSOFT CORP
发明人： CRIVAT IOAN BOGDAN , PETCULESCU CRISTIAN , NETZ AMIR
IPC分类号： G06F17/18 , G06F17/40
CPC分类号： G06F17/30592
摘要： Systems and methodologies for identification of factors that cause significant shifts in transactions in a relational store and/or OLAP environment. Transactions are grouped into significant categories defined across the whole data space, to detect interesting sub spaces transactions. Subsequently, sub spaces that show strong variance between two slices can be selected, followed by grouping the subspaces in sub reports to measure the coverage for each sub report. A final report can then be generated that contains list of sub-reports detected in the previous acts.
摘要翻译：用于识别在关系商店和/或OLAP环境中导致交易发生重大变化的因素的系统和方法。事务分为整个数据空间中定义的重要类别，以检测有趣的子空间事务。随后，可以选择显示两个切片之间强烈差异的子空间，然后将子空间分组到子报告中，以测量每个子报告的覆盖范围。然后可以生成最终报告，其中包含以前行为中检测到的子报告列表。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式