会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • Scalable Incremental Semantic Entity and Relatedness Extraction from Unstructured Text
    • 非结构化文本的可扩展增量语义实体和相关性提取
    • US20110264997A1
    • 2011-10-27
    • US12764107
    • 2010-04-21
    • Kunal MukerjeeSorin Gherman
    • Kunal MukerjeeSorin Gherman
    • G06F17/30G06F17/21
    • G06F16/3334
    • A search engine for documents containing text may process text using a statistical language model, classify the text based on entropy, and create suffix trees or other mappings of the text for each classification. From the suffix trees or mappings, a graph may be constructed with relationship strengths between different words or text strings. The graph may be used to determine search results, and may be browsed or navigated before viewing search results. As new documents are added, they may be processed and added to the suffix trees, then the graph may be created on demand in response to a search request. The graph may be represented as a adjacency matrix, and a transitive closure algorithm may process the adjacency matrix as a background process.
    • 包含文本的文档的搜索引擎可以使用统计语言模型处理文本,基于熵分类文本,并为每个分类创建后缀树或文本的其他映射。 从后缀树或映射中,可以使用不同单词或文本字符串之间的关系强度来构建图形。 该图可以用于确定搜索结果,并且可以在查看搜索结果之前被浏览或导航。 当添加新文档时,可以对它们进行处理并添加到后缀树中,然后可以根据搜索请求按需创建图形。 该图可以表示为邻接矩阵,并且传递闭包算法可以将邻接矩阵作为后台进程来处理。
    • 4. 发明申请
    • IDENTIFYING KEY PHRASES WITHIN DOCUMENTS
    • 在文件中识别关键词
    • US20120143860A1
    • 2012-06-07
    • US12959840
    • 2010-12-03
    • Sorin GhermanKunal Mukerjee
    • Sorin GhermanKunal Mukerjee
    • G06F17/30
    • G06F17/3053G06F17/2715G06F17/2745G06F17/30864
    • The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).
    • 本发明扩展到用于识别文档内的关键短语的方法,系统和计算机程序产品。 本发明的实施例包括使用标签索引来确定文档主要涉及的内容。 例如,集成数据流和提取 - 转换 - 加载流水线,爬行,解析和单词,破坏数据库表中的大量文档。 文件可以分为元组。 元组可以被发送到基于启发式的算法,该算法使用统计语言模型和权重+交叉熵阈值函数来将文档归纳为其“最高N”最统计学意义的短语。 因此,本发明的实施例可以通过显着的和相关的关键短语(标签)来有效地(例如,线性地)和(潜在的大量的)文档的比例来表征。
    • 5. 发明授权
    • Identifying key phrases within documents
    • 识别文档中的关键短语
    • US08423546B2
    • 2013-04-16
    • US12959840
    • 2010-12-03
    • Sorin GhermanKunal Mukerjee
    • Sorin GhermanKunal Mukerjee
    • G06F7/00G06F17/30
    • G06F17/3053G06F17/2715G06F17/2745G06F17/30864
    • The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).
    • 本发明扩展到用于识别文档内的关键短语的方法,系统和计算机程序产品。 本发明的实施例包括使用标签索引来确定文档主要涉及的内容。 例如,集成数据流和提取 - 转换 - 加载流水线,爬行,解析和单词,破坏数据库表中的大量文档。 文件可以分为元组。 元组可以被发送到一个启发式的算法,该算法使用统计语言模型和权重+交叉熵阈值函数来将文档归纳到其前N个最具统计意义的短语中。 因此,本发明的实施例可以通过显着的和相关的关键短语(标签)来有效地(例如,线性地)和(潜在的大量的)文档的比例来表征。