会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 5. 发明申请
    • Scalable Incremental Semantic Entity and Relatedness Extraction from Unstructured Text
    • 非结构化文本的可扩展增量语义实体和相关性提取
    • US20110264997A1
    • 2011-10-27
    • US12764107
    • 2010-04-21
    • Kunal MukerjeeSorin Gherman
    • Kunal MukerjeeSorin Gherman
    • G06F17/30G06F17/21
    • G06F16/3334
    • A search engine for documents containing text may process text using a statistical language model, classify the text based on entropy, and create suffix trees or other mappings of the text for each classification. From the suffix trees or mappings, a graph may be constructed with relationship strengths between different words or text strings. The graph may be used to determine search results, and may be browsed or navigated before viewing search results. As new documents are added, they may be processed and added to the suffix trees, then the graph may be created on demand in response to a search request. The graph may be represented as a adjacency matrix, and a transitive closure algorithm may process the adjacency matrix as a background process.
    • 包含文本的文档的搜索引擎可以使用统计语言模型处理文本,基于熵分类文本,并为每个分类创建后缀树或文本的其他映射。 从后缀树或映射中,可以使用不同单词或文本字符串之间的关系强度来构建图形。 该图可以用于确定搜索结果,并且可以在查看搜索结果之前被浏览或导航。 当添加新文档时,可以对它们进行处理并添加到后缀树中,然后可以根据搜索请求按需创建图形。 该图可以表示为邻接矩阵,并且传递闭包算法可以将邻接矩阵作为后台进程来处理。
    • 6. 发明授权
    • Quantized feature index trajectory
    • 量化特征索引轨迹
    • US07945441B2
    • 2011-05-17
    • US11835389
    • 2007-08-07
    • R. Donald ThompsonKunal Mukerjee
    • R. Donald ThompsonKunal Mukerjee
    • G10L19/00
    • G10L15/02G10L19/0018G10L2015/025
    • Indexing methods are described that may be used by databases, search engines, query and retrieval systems, context sensitive data mining, context mapping, language identification, image recognition, and robotic systems. Raw baseline features from an input signal are aggregated, abstracted and indexed for later retrieval or manipulation. The feature index is the quantization number for the underlying features that are represented by an abstraction. Trajectories are used to signify how the features evolve over time. Features indexes are linked in an ordered sequence indicative of time quanta, where the sequence represents the underlying input signal. An example indexing system based on the described processes is an inverted index that creates a mapping from features or atoms to the underlying documents, files, or data. A highly optimized set of operations can be used to manipulate the quantized feature indexes, where the operations can be fine tuned independent from the base feature set.
    • 描述了可由数据库,搜索引擎,查询和检索系统,上下文相关数据挖掘,上下文映射,语言识别,图像识别和机器人系统使用的索引方法。 来自输入信号的原始基线特征被聚合,抽象和索引,以供以后检索或操纵。 特征索引是由抽象表示的底层特征的量化数。 轨迹用于表示随着时间的推移,特征如何演变。 特征索引以指示时间量子的有序序列链接,其中序列表示底层输入信号。 基于所描述的过程的示例索引系统是反向索引,其创建从特征或原子到底层文档,文件或数据的映射。 可以使用高度优化的操作集来操纵量化的特征索引,其中可以独立于基本特征集来微调操作。
    • 9. 发明授权
    • Identifying key phrases within documents
    • 识别文档中的关键短语
    • US08423546B2
    • 2013-04-16
    • US12959840
    • 2010-12-03
    • Sorin GhermanKunal Mukerjee
    • Sorin GhermanKunal Mukerjee
    • G06F7/00G06F17/30
    • G06F17/3053G06F17/2715G06F17/2745G06F17/30864
    • The present invention extends to methods, systems, and computer program products for identifying key phrases within documents. Embodiments of the invention include using a tag index to determine what a document primarily relates to. For example, an integrated data flow and extract-transform-load pipeline, crawls, parses and word breaks large corpuses of documents in database tables. Documents can be broken into tuples. The tuples can be sent to a heuristically based algorithm that uses statistical language models and weight+cross-entropy threshold functions to summarize the document into its “top N” most statistically significant phrases. Accordingly, embodiments of the invention scale efficiently (e.g., linearly) and (potentially large numbers of) documents can be characterized by salient and relevant key phrases (tags).
    • 本发明扩展到用于识别文档内的关键短语的方法,系统和计算机程序产品。 本发明的实施例包括使用标签索引来确定文档主要涉及的内容。 例如,集成数据流和提取 - 转换 - 加载流水线,爬行,解析和单词,破坏数据库表中的大量文档。 文件可以分为元组。 元组可以被发送到一个启发式的算法,该算法使用统计语言模型和权重+交叉熵阈值函数来将文档归纳到其前N个最具统计意义的短语中。 因此,本发明的实施例可以通过显着的和相关的关键短语(标签)来有效地(例如,线性地)和(潜在的大量的)文档的比例来表征。
    • 10. 发明授权
    • Uncertainty interval content sensing within communications
    • 通信中的不确定性间隔内容感知
    • US08209175B2
    • 2012-06-26
    • US11449354
    • 2006-06-08
    • Kunal MukerjeeRafael Ballesteros
    • Kunal MukerjeeRafael Ballesteros
    • G10L15/04G10L21/00
    • G06Q30/02
    • Repetition of content words in a communication is used to increase the certainty, or, alternatively, reduce the uncertainty, that the content words were actual words from the communication. Reducing the uncertainty of a particular content word of a communication in turn increases the likelihood that the content word is relevant to the communication. Reliable, relevant content words mined from a communication can be used for, e.g., automatic internet searches for documents and/or web sites pertinent to the communication. Reliable, relevant content words mined from a communication can also, or alternatively, be used to automatically generate one or more documents from the communication, e.g., communication summaries, communication outlines, etc.
    • 通信中的内容词的重复用于增加确定性,或者替代地减少不确定性,内容词是来自通信的实际单词。 降低通信的特定内容词的不确定性反过来增加了内容词与通信相关的可能性。 从通信挖掘的可靠的相关内容词可用于例如与通信相关的文档和/或网站的自动互联网搜索。 从通信中挖掘的可靠的,相关的内容词也可以或者替代地用于从通信中自动生成一个或多个文档,例如通信摘要,通信大纲等。