会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 9. 发明申请
    • Document compression system and method for use with tokenspace repository
    • 文档压缩系统和方法用于托管存储库
    • US20070220023A1
    • 2007-09-20
    • US10917739
    • 2004-08-13
    • Jeffrey DeanGautham ThambidoraiSanjay GhemawatBenedict GomesOlcan Sercinoglu
    • Jeffrey DeanGautham ThambidoraiSanjay GhemawatBenedict GomesOlcan Sercinoglu
    • G06F7/00
    • G06F17/30864G06F17/30011G06F17/30371G06F17/30613
    • The disclosed embodiments enable multi-stage query scoring, including “snippet” generation, through incremental document reconstruction facilitated by a multi-tiered mapping scheme. The mapping scheme includes a first mapping between unique tokens contained in a set of documents and unique global token identifiers (e.g., 32-bit integers) contained in a global-lexicon (i.e., dictionary). The mapping scheme also includes a second mapping between the global token identifiers and a set of fixed-length local token identifiers (e.g., 8-bit integers) contained in one or more mini-lexicons (i.e., sub-dictionaries). Each mini-lexicon is associated with a range of token positions in the tokenized documents. The first and second mappings are used to encode/decode documents into local token identifiers having fixed widths which can be compactly stored in the tokenspace repository. The use of fixed-length local token identifiers allows for fast and efficient decoding of tokenized documents.
    • 所公开的实施例通过由多层映射方案促进的增量文档重建能够实现多阶段查询评分,包括“代码段”生成。 映射方案包括包含在一组文档中的唯一标记和包含在全局词典(即字典)中的唯一全局令牌标识符(例如,32位整数)之间的第一映射。 映射方案还包括全局令牌标识符与包含在一个或多个小词典(即子词典)中的一组固定长度的本地令牌标识符(例如,8位整数)之间的第二映射。 每个迷你词典与令牌化文档中的一系列令牌位置相关联。 第一和第二映射用于将文档编码/解码为具有固定宽度的本地令牌标识符,其可以紧凑地存储在令牌空间存储库中。 使用固定长度的本地令牌标识符可以快速有效地解码标记化文档。