会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 61. 发明授权
    • Systems and methods for converting legacy and proprietary documents into extended mark-up language format
    • 将传统和专有文档转换为扩展标记语言格式的系统和方法
    • US07730396B2
    • 2010-06-01
    • US11598083
    • 2006-11-13
    • Boris ChidlovskiiHervé Dejean
    • Boris ChidlovskiiHervé Dejean
    • G06F17/22
    • G06F17/30914G06F17/227
    • A system and method that converts legacy and proprietary documents into extended mark-up language format which treats the conversion as transforming ordered trees of one schema and/or model into ordered trees of another schema and/or model. In embodiments, the tree transformers are coded using a learning method that decomposes the converting task into three components which include path re-labeling, structural composition and input tree traversal, each of which involves learning approaches. The transformation of an input tree into an output tree may involve decomposing the input document, labeling components in the input tree with valid labels or paths from a particular output schema, composing the labeled elements into the output tree with a valid structure, and finding such a traversal of the input tree that achieves the correct composition of the output tree and applies structural rules.
    • 将传统和专有文档转换为扩展标记语言格式的系统和方法,该格式将转换视为将一个模式和/或模型的有序树转换为另一模式和/或模型的有序树。 在实施例中,使用将转换任务分解为包括路径重新标记,结构组合和输入树遍历的三个组件的学习方法对树型变换器进行编码,每个组件涉及学习方法。 将输入树转换为输出树可能涉及分解输入文档,使用来自特定输出模式的有效标签或路径将输入树中的组件标记,将标记的元素组合成具有有效结构的输出树,并且找到这样的结果 输入树的遍历,实现输出树的正确组合并应用结构规则。
    • 62. 发明授权
    • System and method for structured document authoring
    • 结构化文档创作的系统和方法
    • US07296223B2
    • 2007-11-13
    • US10607667
    • 2003-06-27
    • Boris ChidlovskiiHervé Déjean
    • Boris ChidlovskiiHervé Déjean
    • G06F15/00G06F17/00
    • G06F17/2282G06F17/2247G06F17/24Y10S707/99943
    • A method for creating a structured document, wherein a structured document comprises a plurality of content elements wrapped in pairs of tags, includes parsing a document of a particular type containing content into a plurality of content elements; and for each content element, suggesting an optimal tag according to a tag suggestion procedure. The tag suggestion procedure includes providing sample data which has been converted into a structured sample document; deriving a set of tags from the structured sample document; evaluating the set of tags according to tag suggestion criteria to determine an optimal tag for the content element. The optimal tag may be a single tag or a pattern of tags which maximizes a similarity function with patterns found in the sample data.
    • 一种用于创建结构化文档的方法,其中结构化文档包括被包裹成成对的标签的多个内容元素,包括将包含内容的特定类型的文档解析为多个内容元素; 并且针对每个内容元素,根据标签建议过程来提示最佳标签。 标签建议程序包括提供已经转换成结构化样本文档的样本数据; 从结构化样本文档中导出一组标签; 根据标签建议标准评估标签集,以确定内容元素的最佳标签。 最佳标签可以是单个标签或使样本数据中发现的图案最大化相似度函数的标签图案。
    • 63. 发明申请
    • Interactive learning-based document annotation
    • 基于交互式学习的文档注释
    • US20070150801A1
    • 2007-06-28
    • US11316771
    • 2005-12-23
    • Boris ChidlovskiiThierry Jacquin
    • Boris ChidlovskiiThierry Jacquin
    • G06F17/00G06F15/00
    • G06F17/241G06F17/2247G06K9/6254
    • A document annotation system 10 includes a graphical user interface 22 used by an annotator 30 to annotate documents. An active learning component 24 trains an annotation model and proposes annotations to documents based on the annotation model. A request handler 26, 32, 34, 42 conveys annotation requests from the graphical user interface 22 to the active learning component 24, conveys proposed annotations from the active learning component 24 to the graphical user interface 22, and selectably conveys evaluation requests from the graphical user interface 22 to a domain expert 40. During annotation, at least some low probability proposed annotations are presented to the annotator 30 by the graphical user interface 22. The presented low probability proposed annotations enhance training of the annotation model by the active learning component 24.
    • 文档注释系统10包括由注释器30用于注释文档的图形用户界面22。 主动学习组件24训练注释模型,并基于注释模型提出对文档的注释。 请求处理器26,32,34,42将来自图形用户界面22的注释请求传送到主动学习组件24,将所提出的注释从主动学习组件24传送到图形用户界面22,并且可选地传达来自图形的评估请求 在注释期间,至少一些低概率提出的注释由图形用户界面22呈现给注释器30.所呈现的低概率提出的注释增强了主动学习组件24对注释模型的训练 。
    • 65. 发明申请
    • Method for automatic wrapper repair
    • 自动包装修复方法
    • US20060085468A1
    • 2006-04-20
    • US11295367
    • 2005-12-05
    • Boris Chidlovskii
    • Boris Chidlovskii
    • G06F17/30
    • G06F17/30893Y10S707/99931Y10S707/99932Y10S707/99942Y10S707/99945Y10S707/99948
    • A method of information extraction from a Web page using an initial wrapper which has become partially inoperative, wherein the initial wrapper comprises an initial set of rules for extracting information and for assigning labels from a wrapper set of labels to the extracted information, includes using the initial set of rules to extract strings from the Web page parsed in forward direction; analyzing the extracted strings according to the initial set of rules for assigning labels associated with the wrapper; assigning labels to those strings which satisfy the label rules; using the initial set of rules to extract strings from the Web page in backward/(opposite) direction; analyzing the extracted strings according to the set of rules for assigning labels associated with the wrappers; and assigning labels to those unlabeled strings from which satisfy the label rules.
    • 一种使用已经变得部分不起作用的初始包装器从网页提取信息的方法,其中初始包装器包括用于提取信息和从包装纸标签组分配标签到提取的信息的初始规则集,包括使用 从向前解析的网页中提取字符串的初始规则集; 根据用于分配与包装器相关联的标签的初始规则集来分析提取的字符串; 为满足标签规则的字符串分配标签; 使用初始规则集在向后/(相反)方向从网页提取字符串; 根据用于分配与包装纸相关联的标签的规则集来分析提取的字符串; 并将标签分配给满足标签规则的那些未标记的字符串。