会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 21. 发明申请
    • Two-level n-gram index structure and methods of index building, query processing and index derivation
    • 二级n-gram索引结构和索引构建方法,查询处理和索引推导
    • US20070050384A1
    • 2007-03-01
    • US11501265
    • 2006-08-09
    • Kyu-Young WhangMin-Soo KimJae-Gil LeeMin-Jae Lee
    • Kyu-Young WhangMin-Soo KimJae-Gil LeeMin-Jae Lee
    • G06F7/00
    • G06F17/30622
    • Disclosed relates to a structure of two-level n-gram inverted index and methods of building the same, processing queries and deriving the index that reduce the size of n-gram inverted index and improves the query performance by eliminating the redundancy of the position information that exists in the n-gram inverted index. The inverted index of the present invention comprises a back-end inverted index using subsequences extracted from documents as a term and a front-end inverted index using n-grams extracted from the subsequences as a term. The back-end inverted index uses the subsequences of a specific length extracted from the documents to be overlapped with each other by n−1 (n: the length of n-gram) as a term and stores position information of the subsequences occurring in the documents in a posting list for the respective subsequences. The front-end inverted index uses the n-grams of a specific length extracted from the subsequences using a 1-sliding technique as a term and stores position information of the n-grams occurring in the subsequences in a posting list for the respective n-grams.
    • 本发明涉及二级n-gram反向索引的结构及其构建方法,处理查询和导出减少n-gram反向索引大小的索引,并通过消除位置信息的冗余来提高查询性能 存在于n-gram倒排指数中。 本发明的倒排索引包括使用从文档中提取的子序列作为术语的后端反向索引,以及使用从子序列提取的n-gram作为术语的前端反向索引。 后端倒排索引使用从文档提取的特定长度的子序列作为项目彼此重叠,n-1(n:n-gram的长度)作为项,并存储发生在该文件中的子序列的位置信息 相关子序列的发布列表中的文档。 前端反向索引使用使用1-滑动技术作为术语从子序列中提取的特定长度的n克,并存储在子序列中出现的n个克数的位置信息, 克。