会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 5. 发明申请
    • Scalable Incremental Semantic Entity and Relatedness Extraction from Unstructured Text
    • 非结构化文本的可扩展增量语义实体和相关性提取
    • US20110264997A1
    • 2011-10-27
    • US12764107
    • 2010-04-21
    • Kunal MukerjeeSorin Gherman
    • Kunal MukerjeeSorin Gherman
    • G06F17/30G06F17/21
    • G06F16/3334
    • A search engine for documents containing text may process text using a statistical language model, classify the text based on entropy, and create suffix trees or other mappings of the text for each classification. From the suffix trees or mappings, a graph may be constructed with relationship strengths between different words or text strings. The graph may be used to determine search results, and may be browsed or navigated before viewing search results. As new documents are added, they may be processed and added to the suffix trees, then the graph may be created on demand in response to a search request. The graph may be represented as a adjacency matrix, and a transitive closure algorithm may process the adjacency matrix as a background process.
    • 包含文本的文档的搜索引擎可以使用统计语言模型处理文本,基于熵分类文本,并为每个分类创建后缀树或文本的其他映射。 从后缀树或映射中,可以使用不同单词或文本字符串之间的关系强度来构建图形。 该图可以用于确定搜索结果,并且可以在查看搜索结果之前被浏览或导航。 当添加新文档时,可以对它们进行处理并添加到后缀树中,然后可以根据搜索请求按需创建图形。 该图可以表示为邻接矩阵,并且传递闭包算法可以将邻接矩阵作为后台进程来处理。
    • 6. 发明授权
    • Quantized feature index trajectory
    • 量化特征索引轨迹
    • US07945441B2
    • 2011-05-17
    • US11835389
    • 2007-08-07
    • R. Donald ThompsonKunal Mukerjee
    • R. Donald ThompsonKunal Mukerjee
    • G10L19/00
    • G10L15/02G10L19/0018G10L2015/025
    • Indexing methods are described that may be used by databases, search engines, query and retrieval systems, context sensitive data mining, context mapping, language identification, image recognition, and robotic systems. Raw baseline features from an input signal are aggregated, abstracted and indexed for later retrieval or manipulation. The feature index is the quantization number for the underlying features that are represented by an abstraction. Trajectories are used to signify how the features evolve over time. Features indexes are linked in an ordered sequence indicative of time quanta, where the sequence represents the underlying input signal. An example indexing system based on the described processes is an inverted index that creates a mapping from features or atoms to the underlying documents, files, or data. A highly optimized set of operations can be used to manipulate the quantized feature indexes, where the operations can be fine tuned independent from the base feature set.
    • 描述了可由数据库,搜索引擎,查询和检索系统,上下文相关数据挖掘,上下文映射,语言识别,图像识别和机器人系统使用的索引方法。 来自输入信号的原始基线特征被聚合,抽象和索引,以供以后检索或操纵。 特征索引是由抽象表示的底层特征的量化数。 轨迹用于表示随着时间的推移,特征如何演变。 特征索引以指示时间量子的有序序列链接,其中序列表示底层输入信号。 基于所描述的过程的示例索引系统是反向索引,其创建从特征或原子到底层文档,文件或数据的映射。 可以使用高度优化的操作集来操纵量化的特征索引,其中可以独立于基本特征集来微调操作。
    • 10. 发明授权
    • Identifying key phrases within documents
    • 识别文档中的关键短语
    • US08423546B2
    • 2013-04-16
    • US12959840
    • 2010-12-03
    • Sorin GhermanKunal Mukerjee
    • Sorin GhermanKunal Mukerjee
    • G06F7/00G06F17/30
    • G06F17/3053G06F17/2715G06F17/2745G06F17/30864
    • The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).
    • 本发明扩展到用于识别文档内的关键短语的方法,系统和计算机程序产品。 本发明的实施例包括使用标签索引来确定文档主要涉及的内容。 例如,集成数据流和提取 - 转换 - 加载流水线,爬行,解析和单词,破坏数据库表中的大量文档。 文件可以分为元组。 元组可以被发送到一个启发式的算法,该算法使用统计语言模型和权重+交叉熵阈值函数来将文档归纳到其前N个最具统计意义的短语中。 因此,本发明的实施例可以通过显着的和相关的关键短语(标签)来有效地(例如,线性地)和(潜在的大量的)文档的比例来表征。