专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

11. 发明申请

US20080091666A1 Method and System for Processing a Text Search Query in a Collection of Documents 有权
标题翻译：在文件集合中处理文本搜索查询的方法和系统
公开(公告)号：US20080091666A1
公开(公告)日：2008-04-17
申请号：US11952627
申请日：2007-12-07
申请人： Andrea Baader , Jochen Doerre , Monika Matschke , Andreas Neumann , Roland Seiffert
发明人： Andrea Baader , Jochen Doerre , Monika Matschke , Andreas Neumann , Roland Seiffert
IPC分类号： G06F17/30
CPC分类号： G06F17/30646 , Y10S707/99933 , Y10S707/99942 , Y10S707/99943
摘要： According to the present invention a method and an infrastructure are provided for processing a text search query in a collection of documents (100). Therefore, a full posting index (200) is generated, stored and updated for each document added to the collection (100). Said full posting index (200) comprising a set of index terms and a full posting list for each index term of said set, enumerating all occurrences of said index term in all documents of the collection (100). In addition to said full posting index (200) at least one additional posting index (400, 500, 600) is generated, stored and updated for each document added to the collection (100). Said additional posting index (400, 500, 600) is related to a defined document part and comprises a set of index terms and a restricted posting list for each index term of said set, enumerating all occurrences of said index term in said document part of all documents of the collection (100). A text search query comprises search conditions on search terms, which are translated into conditions on the index terms of said full posting index (200). Then, said translated conditions of a given text search query are optimized (a) by identifying all conditions of said translated conditions, which are restricted to defined document parts, for which an additional posting index is available, and (b) by re-writing said identified conditions with part restriction as pair conditions on index terms of said additional posting index (400, 500, 600) and the corresponding document part. Thus, said pair conditions can be processed by using only said additional posting index (400, 500, 600).
摘要翻译：根据本发明，提供一种用于在文档集合（100）中处理文本搜索查询的方法和基础设施。因此，为添加到集合（100）的每个文档生成，存储和更新完整发布索引（200）。所述完整发布索引（200）包括一组索引项和用于所述集合的每个索引项的完整发布列表，列举所述集合（100）的所有文档中的所有索引项的所有出现。除了所述完整发布索引（200）之外，为添加到集合（100）的每个文档生成，存储和更新至少一个附加发布索引（400,500,600）。所述附加发布索引（400,500,600）与定义的文档部分相关，并且包括一组索引项和针对所述集合的每个索引项的限制发布列表，列举在所述文档部分的所有文档部分中的所有索引项的所有出现所有文件的收藏（100）。文本搜索查询包括关于搜索词的搜索条件，其被转换成所述完整发布索引（200）的索引项的条件。然后，对给定文本搜索查询的所述翻译条件进行优化（a）通过识别所述翻译条件的所有条件，所述条件限于定义的文档部分，附加的发布索引可用于其定义的文档部分，以及（b）通过重写所述识别的条件具有部分限制，作为所述附加发布索引（400,500,600）的索引项和对应的文档部分的对条件。因此，可以仅使用所述附加过帐索引（400,500,600）来处理所述对条件。

IPRDB

热门服务

关于我们

友情链接

联系方式