会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • APPARATUS AND METHODS FOR CONCEPT-CENTRIC INFORMATION EXTRACTION
    • 概念中心信息提取的装置和方法
    • US20100241639A1
    • 2010-09-23
    • US12408450
    • 2009-03-20
    • Daniel KiferSrujana MeruguAnkur JainSathiya Keerthi SelvarajAlok S. KirpalPhilip L. BohannonRaghu Ramakrishnan
    • Daniel KiferSrujana MeruguAnkur JainSathiya Keerthi SelvarajAlok S. KirpalPhilip L. BohannonRaghu Ramakrishnan
    • G06F17/30
    • G06F16/345G06F16/313
    • Disclosed are methods and apparatus for extracting (or annotating) structured information from web content. Web content of interest from a particular domain is represented as one or more tree instances having a plurality of branching nodes that each correspond to a web object such that the tree instances correspond to one or more structured data instances. The particular domain is associated with domain knowledge that includes one or more presentation rulesets that each specifies a particular structure for a set of data instances, a domain-specific concept labeler, one or more specified properties of the web objects in the tree instances, and a concept schema that specifies a representation of the data to be extracted from the web content. A structured data instance that conforms to the concept schema is extracted from the one or more tree instances based on the domain knowledge for the particular domain. Extraction of the structured data instances is accomplished by (i) using the domain-specific concept labeler to annotate a subset of nodes of the tree instances; and (ii) using a locally adaptive concept annotator to extract the structured data instances based on the annotated segments and the local properties associated with such annotated segments. The extracted structured data instance is stored as structured output records in a database.
    • 公开了从网页内容中提取(或注释)结构化信息的方法和装置。 来自特定域的感兴趣的Web内容被表示为具有多个分支节点的一个或多个树实例,每个分支节点对应于web对象,使得树实例对应于一个或多个结构化数据实例。 特定域与域知识相关联,其包括一个或多个呈现规则集,每个表示规则集指定一组数据实例的特定结构,特定于域的概念标签器,树实例中的web对象的一个​​或多个指定的属性,以及 一个概念模式,指定要从Web内容中提取的数据的表示。 基于特定域的域知识,从一个或多个树实例提取符合概念模式的结构化数据实例。 结构化数据实例的提取是通过(i)使用域特定概念标签器来注释树实例的节点的子集来实现的; 以及(ii)使用本地适应性概念注释器基于所注释的段和与这些注释段相关联的本地属性来提取结构化数据实例。 提取的结构化数据实例作为结构化输出记录存储在数据库中。
    • 3. 发明申请
    • TRANSDUCTIVE APPROACH TO CATEGORY-SPECIFIC RECORD ATTRIBUTE EXTRACTION
    • 对特定记录属性提取的传播方法
    • US20100274770A1
    • 2010-10-28
    • US12429442
    • 2009-04-24
    • Rahul GuptaSathiya Keerthi SelvarajDaniel KiferSrujana Merugu
    • Rahul GuptaSathiya Keerthi SelvarajDaniel KiferSrujana Merugu
    • G06F17/30
    • G06F16/951G06F16/285
    • Disclosed are methods and apparatus for segmenting and labeling a collection of token sequences. A plurality of segments of one or more tokens in a token sequence collection are partially labeled with labels from a set of target labels using high precision domain-specific labelers so as to generate a partially labeled sequence collection having a plurality of labeled segments and a plurality of unlabeled segments. Any label conflicts in the partially labeled sequence collection are resolved. One or more of the labeled segments of the partially labeled sequence collection are expanded so as to cover one or more additional tokens of the partially labeled sequence collection. A statistical model, for labeling segments using local token and segment features of the sequence collection, is trained based on the partially labeled sequence collection. This trained model is then used to label the unlabeled segments and the labeled segments of the sequence collection so as to generate a labeled sequence collection. The labeled sequence collection is then stored as structured output records in a database.
    • 公开了用于分割和标记令牌序列集合的方法和装置。 令牌序列集合中的一个或多个令牌的多个片段使用高精度域专用标签器从一组目标标签部分标记,以便生成具有多个标记片段和多个标记片段的部分标记序列集合 的未标记片段。 部分标记的序列集合中的任何标签冲突都被解决。 扩展部分标记的序列集合的一个或多个标记片段,以覆盖部分标记的序列集合的一个或多个附加标记。 基于部分标记的序列集合训练用于使用本地令牌和序列集合的片段特征来标记片段的统计模型。 然后将该训练模型用于标记序列集合的未标记片段和标记片段,以产生标记序列集合。 标记的序列集合然后作为结构化输出记录存储在数据库中。
    • 4. 发明授权
    • Extracting rich temporal context for business entities and events
    • 为业务实体和事件提取丰富的时间背景
    • US08606564B2
    • 2013-12-10
    • US12917389
    • 2010-11-01
    • Srujana MeruguSathiya Keerthi SelvarajVipul AgarwalArup Kumar Choudhury
    • Srujana MeruguSathiya Keerthi SelvarajVipul AgarwalArup Kumar Choudhury
    • G06F17/27G06F17/30
    • G06F17/30864G06F17/271G06F17/277
    • Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.
    • 公开了用于为商业实体和事件执行计算机实现的时间信息提取的方法和装置。 在一个实施例中,获得文本序列。 将标签分配给文本的多个片段中的一个或多个,使得文本的多个片段中的一个或多个片段中的每一个被分类为多个类别的时间数据之一的时间数据。 将一个或多个规则应用于已被分类为时间数据的文本的一个或多个段以生成时间数据的结构化表示,其中规则包括一个或多个示意图规则。 示意性规则中的每一个涉及多个时间数据类别中的一个或多个,并且指示要存储多个类中对应的一个或多个类别中的时间数据的结构。
    • 5. 发明申请
    • EXTRACTING RICH TEMPORAL CONTEXT FOR BUSINESS ENTITIES AND EVENTS
    • 为商业实体和活动提供丰富的时间背景
    • US20120109637A1
    • 2012-05-03
    • US12917389
    • 2010-11-01
    • Srujana MeruguSathiya Keerthi SelvarajVipul AgarwalArup Kumar Choudhury
    • Srujana MeruguSathiya Keerthi SelvarajVipul AgarwalArup Kumar Choudhury
    • G06F17/27G06F17/30
    • G06F17/30864G06F17/271G06F17/277
    • Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.
    • 公开了用于为商业实体和事件执行计算机实现的时间信息提取的方法和装置。 在一个实施例中,获得文本序列。 将标签分配给文本的多个片段中的一个或多个,使得文本的多个片段中的一个或多个片段中的每一个被分类为多个类别的时间数据之一的时间数据。 将一个或多个规则应用于已被分类为时间数据的文本的一个或多个段以生成时间数据的结构化表示,其中规则包括一个或多个示意图规则。 示意性规则中的每一个涉及多个时间数据类别中的一个或多个,并且指示要存储多个类中对应的一个或多个类别中的时间数据的结构。