专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20070214135A1 Partitioning of data mining training set 有权
标题翻译：数据挖掘训练集分区
公开(公告)号：US20070214135A1
公开(公告)日：2007-09-13
申请号：US11371477
申请日：2006-03-09
申请人： Ioan Crivat , Raman Iyer , C. MdcLennan
发明人： Ioan Crivat , Raman Iyer , C. MdcLennan
IPC分类号： G06F17/30
CPC分类号： G06F17/30539
摘要： A system that effectuates fetching a complete set of relational data into a mining services server and subsequently defining desired partitions upon the fetched data is provided. In accordance with the innovation, the data can be locally cached and partitioned therefrom. Accordingly, upon the same mining structure (e.g., cache) that has been partitioned, the novel innovation can build mining models for each partition. In other words, the innovation can employ the concept of mining structure as a data cache while manipulating only partitions of this cache in certain operations. The innovation can be employed in scenarios where a user wants to train a mining model using only data points that satisfy a particular Boolean condition, a user wants to split the training set into multiple partitions (e.g., training/testing) and/or a user wants to perform a data mining procedure known as “N-fold cross validation.”
摘要翻译：提供了一种能够将完整的关系数据集提取到采矿服务服务器中并随后在获取的数据上定义所需分区的系统。根据创新，数据可以被本地缓存并从中分割。因此，在已经被划分的相同挖掘结构（例如，高速缓存）上，新颖的创新可以为每个分区建立挖掘模型。换句话说，创新可以采用挖掘结构的概念作为数据高速缓存，同时在某些操作中仅操纵该高速缓存的分区。该创新可以在用户想要仅使用满足特定布尔条件的数据点来训练挖掘模型的情况下使用，用户希望将训练集合分成多个分区（例如，训练/测试）和/或用户想要执行称为“N-fold交叉验证”的数据挖掘过程。

IPRDB

热门服务

关于我们

友情链接

联系方式