会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • WEBPAGE ENTITY EXTRACTION THROUGH JOINT UNDERSTANDING OF PAGE STRUCTURES AND SENTENCES
    • 通过对页面结构和结构的联合理解来提取实体实体
    • US20110078554A1
    • 2011-03-31
    • US12569912
    • 2009-09-30
    • Zaiqing NieYong CaoJi-Rong WenChunyu Yang
    • Zaiqing NieYong CaoJi-Rong WenChunyu Yang
    • G06F17/21
    • G06F17/278
    • Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.
    • 描述了一种用于理解网页的实体的技术,例如标记网页上的实体。 迭代和双向框架处理网页,包括向结构理解组件(例如,扩展HCRF模型)提供文本分段特征的文本理解组件(例如,扩展Semi-CRF模型)。 结构理解组件使用网页的文本分割特征和视觉布局特征来识别结构(例如,标记块)。 文本理解组件依次使用标记块来进一步理解文本。 该过程继续迭代直到满足相似性标准,此时实体可以被标记。 还描述了使用多个提及网页中的一组文本来帮助标注一个实体。
    • 2. 发明授权
    • Webpage entity extraction through joint understanding of page structures and sentences
    • 网页实体提取通过联合理解页面结构和句子
    • US09092424B2
    • 2015-07-28
    • US12569912
    • 2009-09-30
    • Zaiqing NieYong CaoJi-Rong WenChunyu Yang
    • Zaiqing NieYong CaoJi-Rong WenChunyu Yang
    • G06F17/00G06F17/27
    • G06F17/278
    • Described is a technology for understanding entities of a webpage, e.g., to label the entities on the webpage. An iterative and bidirectional framework processes a webpage, including a text understanding component (e.g., extended Semi-CRF model) that provides text segmentation features to a structure understanding component (e.g., extended HCRF model). The structure understanding component uses the text segmentation features and visual layout features of the webpage to identify a structure (e.g., labeled block). The text understanding component in turn uses the labeled block to further understand the text. The process continues iteratively until a similarity criterion is met, at which time the entities may be labeled. Also described is the use of multiple mentions of a set of text in the webpage to help in labeling an entity.
    • 描述了一种用于理解网页的实体的技术,例如标记网页上的实体。 迭代和双向框架处理网页,包括向结构理解组件(例如,扩展HCRF模型)提供文本分段特征的文本理解组件(例如,扩展Semi-CRF模型)。 结构理解组件使用网页的文本分割特征和视觉布局特征来识别结构(例如,标记块)。 文本理解组件依次使用标记块来进一步理解文本。 该过程继续迭代直到满足相似性标准,此时实体可以被标记。 还描述了使用多个提及网页中的一组文本来帮助标注一个实体。
    • 3. 发明授权
    • Interactive framework for name disambiguation
    • 互动框架的名称消歧
    • US08538898B2
    • 2013-09-17
    • US13118404
    • 2011-05-28
    • Zhengdong LuZaiqing NieGang LuoYong CaoJi-Rong WenWei-Ying Ma
    • Zhengdong LuZaiqing NieGang LuoYong CaoJi-Rong WenWei-Ying Ma
    • G06N5/00
    • G06N99/005G06F17/30616
    • A “Name Disambiguator” provides various techniques for implementing an interactive framework for resolving or disambiguating entity names (associated with objects such as publications) for entity searches where two or more same or similar names may refer to different entities. More specifically, the Name Disambiguator uses a combination of user input and automatic models to address the disambiguation problem. In various embodiments, the Name Disambiguator uses a two part process, including: 1) a global SVM trained from large sets of documents or objects in a simulated interactive mode, and 2) further personalization of local SVM models (associated with individual names or groups of names such as, for example, a group of coauthors) derived from the global SVM model. The result of this process is that large sets of documents or objects are rapidly and accurately condensed or clustered into ordered sets by that are organized by entity names.
    • “名称歧义者”提供了各种技术,用于实现用于解析或消除实体名称(与诸如出版物的对象相关联)的交互式框架,用于实体搜索,其中两个或多个相同或相似的名称可以指代不同的实体。 更具体地说,名称消歧器使用用户输入和自动模型的组合来解决消歧问题。 在各种实施例中,名称消歧器使用两部分过程,包括:1)以模拟交互模式从大量文档或对象训练的全局SVM,以及2)本地SVM模型的进一步个性化(与个体名称或组相关联 来自全球SVM模型的名称,例如一组合作者。 这个过程的结果是,大量的文档或对象可以通过按实体名称组织的快速,准确的浓缩或聚类成有序集。
    • 4. 发明申请
    • INTERACTIVE FRAMEWORK FOR NAME DISAMBIGUATION
    • 名称撤销的互动框架
    • US20120303557A1
    • 2012-11-29
    • US13118404
    • 2011-05-28
    • Zhengdong LuZaiqing NieGang LuoYong CaoJi-Rong WenWei-Ying Ma
    • Zhengdong LuZaiqing NieGang LuoYong CaoJi-Rong WenWei-Ying Ma
    • G06F15/18
    • G06N99/005G06F17/30616
    • A “Name Disambiguator” provides various techniques for implementing an interactive framework for resolving or disambiguating entity names (associated with objects such as publications) for entity searches where two or more same or similar names may refer to different entities. More specifically, the Name Disambiguator uses a combination of user input and automatic models to address the disambiguation problem. In various embodiments, the Name Disambiguator uses a two part process, including: 1) a global SVM trained from large sets of documents or objects in a simulated interactive mode, and 2) further personalization of local SVM models (associated with individual names or groups of names such as, for example, a group of coauthors) derived from the global SVM model. The result of this process is that large sets of documents or objects are rapidly and accurately condensed or clustered into ordered sets by that are organized by entity names.
    • 名称消歧者提供各种技术,用于实现用于解析或消除实体名称(与诸如出版物的对象相关联)的交互式框架,用于实体搜索,其中两个或多个相同或相似的名称可以指代不同的实体。 更具体地说,名称消歧器使用用户输入和自动模型的组合来解决消歧问题。 在各种实施例中,名称消歧器使用两部分过程,包括:1)以模拟交互模式从大量文档或对象训练的全局SVM,以及2)本地SVM模型的进一步个性化(与个体名称或组相关联 来自全球SVM模型的名称,例如一组合作者。 这个过程的结果是,大量的文档或对象可以通过按实体名称组织的快速,准确的浓缩或聚类成有序集。
    • 7. 发明授权
    • Hierarchical conditional random fields for web extraction
    • Web提取的分层条件随机字段
    • US07720830B2
    • 2010-05-18
    • US11461400
    • 2006-07-31
    • Ji-Rong WenWei-Ying MaZaiqing NieJun Zhu
    • Ji-Rong WenWei-Ying MaZaiqing NieJun Zhu
    • G06F7/00G06F17/30G06F17/00G06F15/173
    • G06F17/3089G06F17/30994
    • A method and system for labeling object information of an information page is provided. A labeling system identifies an object record of an information page based on the labeling of object elements within an object record and labels object elements based on the identification of an object record that contains the object elements. To identify the records and label the elements, the labeling system generates a hierarchical representation of blocks of an information page. The labeling system identifies records and elements within the records by propagating probability-related information of record labels and element labels through the hierarchy of the blocks. The labeling system generates a feature vector for each block to represent the block and calculates a probability of a label for a block being correct based on a score derived from the feature vectors associated with related blocks. The labeling system searches for the labeling of records and elements that has the highest probability of being correct.
    • 提供了一种用于标记信息页面的对象信息的方法和系统。 标签系统基于对象记录中的对象元素的标签来识别信息页面的对象记录,并且基于包含对象元素的对象记录的标识来标记对象元素。 为了识别记录并标记元素,标签系统生成信息页的块的分层表示。 标签系统通过块的层次传播记录标签和元素标签的概率相关信息来识别记录中的记录和元素。 标签系统为每个块生成特征向量以表示块,并且基于从与相关块相关联的特征向量导出的分数来计算块正确的标签的概率。 标签系统搜索具有最高准确概率的记录和元素的标签。
    • 8. 发明授权
    • Information classification paradigm
    • 信息分类范式
    • US07529748B2
    • 2009-05-05
    • US11276818
    • 2006-03-15
    • Ji-Rong WenYan-Feng SunWei-Ying MaZaiqing NieRenkuan Jiang
    • Ji-Rong WenYan-Feng SunWei-Ying MaZaiqing NieRenkuan Jiang
    • G06F17/30
    • G06F17/30707Y10S707/99933Y10S707/99937
    • A mechanism to classify source documents into one of two categories, either likely to contain desired information or unlikely to contain desired information. Generally some form of rules based classification in conjunction with deeper analysis using advanced techniques on difficult cases is utilized. The rules based classification is generally good for eliminating cases from further consideration and for identifying documents of interest based on generally discernable relationships between data or based on the presence or absence of data. The deeper analysis is used to uncover more complex relationships between data that may identify documents of interest. Portions of the process may use the entire document while other portions of the process may use only a portion of the document.
    • 将源文档分类为两个类别之一的机制,可能包含所需信息或不太可能包含所需信息。 通常使用某种形式的基于规则的分类,结合使用先进技术在困难案例上进行更深入的分析。 基于规则的分类通常对于消除进一步考虑的情况以及基于数据之间的一般可辨别的关系或基于数据的存在或不存在来识别感兴趣的文档是有益的。 更深入的分析用于发现可能识别感兴趣文档的数据之间更复杂的关系。 过程的一部分可以使用整个文档,而进程的其他部分可以仅使用文档的一部分。