专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US09141966B2 Opinion aggregation system 有权
标题翻译：意见汇总制度
公开(公告)号：US09141966B2
公开(公告)日：2015-09-22
申请号：US12646574
申请日：2009-12-23
申请人： Srujana Merugu , Arun Shankar Iyer , Ashwin Kumar V. Machanavajjhala , Sathiya Keerthi Selvaraj , Philip L. Bohannon
发明人： Srujana Merugu , Arun Shankar Iyer , Ashwin Kumar V. Machanavajjhala , Sathiya Keerthi Selvaraj , Philip L. Bohannon
IPC分类号： G06F9/44 , G06N7/02 , G06N7/06 , G06Q30/02 , G06K9/62 , G06N7/00 , G06N5/04 , G06Q10/06 , G06Q50/00
CPC分类号： G06Q30/0203 , G06K9/6277 , G06K9/6278 , G06N5/04 , G06N7/005 , G06Q10/06 , G06Q50/01
摘要： A system is disclosed for obtaining and aggregating opinions generated by multiple sources with respect to one or more objects. The disclosed system uses observed variables associated with an opinion and a probabilistic model to estimate latent properties of that opinion. With those latent properties, the disclosed system may enable publishers to reliably and comprehensively present object information to interested users.
摘要翻译：公开了一种用于获得和聚集由多个源产生的关于一个或多个对象的意见的系统。所公开的系统使用与意见和概率模型相关联的观察变量来估计该意见的潜在属性。利用这些潜在属性，所公开的系统可以使发布者可以可靠地和全面地向感兴趣的用户呈现对象信息。

2. 发明申请

US20100241639A1 APPARATUS AND METHODS FOR CONCEPT-CENTRIC INFORMATION EXTRACTION 审中-公开
标题翻译：概念中心信息提取的装置和方法
公开(公告)号：US20100241639A1
公开(公告)日：2010-09-23
申请号：US12408450
申请日：2009-03-20
申请人： Daniel Kifer , Srujana Merugu , Ankur Jain , Sathiya Keerthi Selvaraj , Alok S. Kirpal , Philip L. Bohannon , Raghu Ramakrishnan
发明人： Daniel Kifer , Srujana Merugu , Ankur Jain , Sathiya Keerthi Selvaraj , Alok S. Kirpal , Philip L. Bohannon , Raghu Ramakrishnan
IPC分类号： G06F17/30
CPC分类号： G06F16/345 , G06F16/313
摘要： Disclosed are methods and apparatus for extracting (or annotating) structured information from web content. Web content of interest from a particular domain is represented as one or more tree instances having a plurality of branching nodes that each correspond to a web object such that the tree instances correspond to one or more structured data instances. The particular domain is associated with domain knowledge that includes one or more presentation rulesets that each specifies a particular structure for a set of data instances, a domain-specific concept labeler, one or more specified properties of the web objects in the tree instances, and a concept schema that specifies a representation of the data to be extracted from the web content. A structured data instance that conforms to the concept schema is extracted from the one or more tree instances based on the domain knowledge for the particular domain. Extraction of the structured data instances is accomplished by (i) using the domain-specific concept labeler to annotate a subset of nodes of the tree instances; and (ii) using a locally adaptive concept annotator to extract the structured data instances based on the annotated segments and the local properties associated with such annotated segments. The extracted structured data instance is stored as structured output records in a database.
摘要翻译：公开了从网页内容中提取（或注释）结构化信息的方法和装置。来自特定域的感兴趣的Web内容被表示为具有多个分支节点的一个或多个树实例，每个分支节点对应于web对象，使得树实例对应于一个或多个结构化数据实例。特定域与域知识相关联，其包括一个或多个呈现规则集，每个表示规则集指定一组数据实例的特定结构，特定于域的概念标签器，树实例中的web对象的一个或多个指定的属性，以及一个概念模式，指定要从Web内容中提取的数据的表示。基于特定域的域知识，从一个或多个树实例提取符合概念模式的结构化数据实例。结构化数据实例的提取是通过（i）使用域特定概念标签器来注释树实例的节点的子集来实现的; 以及（ii）使用本地适应性概念注释器基于所注释的段和与这些注释段相关联的本地属性来提取结构化数据实例。提取的结构化数据实例作为结构化输出记录存储在数据库中。

3. 发明申请

US20100274770A1 TRANSDUCTIVE APPROACH TO CATEGORY-SPECIFIC RECORD ATTRIBUTE EXTRACTION 审中-公开
标题翻译：对特定记录属性提取的传播方法
公开(公告)号：US20100274770A1
公开(公告)日：2010-10-28
申请号：US12429442
申请日：2009-04-24
申请人： Rahul Gupta , Sathiya Keerthi Selvaraj , Daniel Kifer , Srujana Merugu
发明人： Rahul Gupta , Sathiya Keerthi Selvaraj , Daniel Kifer , Srujana Merugu
IPC分类号： G06F17/30
CPC分类号： G06F16/951 , G06F16/285
摘要： Disclosed are methods and apparatus for segmenting and labeling a collection of token sequences. A plurality of segments of one or more tokens in a token sequence collection are partially labeled with labels from a set of target labels using high precision domain-specific labelers so as to generate a partially labeled sequence collection having a plurality of labeled segments and a plurality of unlabeled segments. Any label conflicts in the partially labeled sequence collection are resolved. One or more of the labeled segments of the partially labeled sequence collection are expanded so as to cover one or more additional tokens of the partially labeled sequence collection. A statistical model, for labeling segments using local token and segment features of the sequence collection, is trained based on the partially labeled sequence collection. This trained model is then used to label the unlabeled segments and the labeled segments of the sequence collection so as to generate a labeled sequence collection. The labeled sequence collection is then stored as structured output records in a database.
摘要翻译：公开了用于分割和标记令牌序列集合的方法和装置。令牌序列集合中的一个或多个令牌的多个片段使用高精度域专用标签器从一组目标标签部分标记，以便生成具有多个标记片段和多个标记片段的部分标记序列集合的未标记片段。部分标记的序列集合中的任何标签冲突都被解决。扩展部分标记的序列集合的一个或多个标记片段，以覆盖部分标记的序列集合的一个或多个附加标记。基于部分标记的序列集合训练用于使用本地令牌和序列集合的片段特征来标记片段的统计模型。然后将该训练模型用于标记序列集合的未标记片段和标记片段，以产生标记序列集合。标记的序列集合然后作为结构化输出记录存储在数据库中。

4. 发明授权

US08606564B2 Extracting rich temporal context for business entities and events 有权
标题翻译：为业务实体和事件提取丰富的时间背景
公开(公告)号：US08606564B2
公开(公告)日：2013-12-10
申请号：US12917389
申请日：2010-11-01
申请人： Srujana Merugu , Sathiya Keerthi Selvaraj , Vipul Agarwal , Arup Kumar Choudhury
发明人： Srujana Merugu , Sathiya Keerthi Selvaraj , Vipul Agarwal , Arup Kumar Choudhury
IPC分类号： G06F17/27 , G06F17/30
CPC分类号： G06F17/30864 , G06F17/271 , G06F17/277
摘要： Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.
摘要翻译：公开了用于为商业实体和事件执行计算机实现的时间信息提取的方法和装置。在一个实施例中，获得文本序列。将标签分配给文本的多个片段中的一个或多个，使得文本的多个片段中的一个或多个片段中的每一个被分类为多个类别的时间数据之一的时间数据。将一个或多个规则应用于已被分类为时间数据的文本的一个或多个段以生成时间数据的结构化表示，其中规则包括一个或多个示意图规则。示意性规则中的每一个涉及多个时间数据类别中的一个或多个，并且指示要存储多个类中对应的一个或多个类别中的时间数据的结构。

5. 发明申请

US20120109637A1 EXTRACTING RICH TEMPORAL CONTEXT FOR BUSINESS ENTITIES AND EVENTS 有权
标题翻译：为商业实体和活动提供丰富的时间背景
公开(公告)号：US20120109637A1
公开(公告)日：2012-05-03
申请号：US12917389
申请日：2010-11-01
申请人： Srujana Merugu , Sathiya Keerthi Selvaraj , Vipul Agarwal , Arup Kumar Choudhury
发明人： Srujana Merugu , Sathiya Keerthi Selvaraj , Vipul Agarwal , Arup Kumar Choudhury
IPC分类号： G06F17/27 , G06F17/30
CPC分类号： G06F17/30864 , G06F17/271 , G06F17/277
摘要： Methods and apparatus for performing computer-implemented extraction of temporal information for business entities and events are disclosed. In one embodiment, a sequence of text is obtained. A label is assigned to one or more of a plurality of segments of the text such that each of the one or more of the plurality of segments of the text is classified as temporal data in one of a plurality of classes of temporal data. One or more rules are applied to the one or more segments of the text that have been classified as temporal data to generate a structured representation of the temporal data, where the rules include one or more schematic rules. Each of the schematic rules pertains to one or more of the plurality of classes of temporal data and indicates a structure in which temporal data in the corresponding one or more of the plurality of classes is to be stored.
摘要翻译：公开了用于为商业实体和事件执行计算机实现的时间信息提取的方法和装置。在一个实施例中，获得文本序列。将标签分配给文本的多个片段中的一个或多个，使得文本的多个片段中的一个或多个片段中的每一个被分类为多个类别的时间数据之一的时间数据。将一个或多个规则应用于已被分类为时间数据的文本的一个或多个段以生成时间数据的结构化表示，其中规则包括一个或多个示意图规则。示意性规则中的每一个涉及多个时间数据类别中的一个或多个，并且指示要存储多个类中对应的一个或多个类别中的时间数据的结构。

6. 发明授权

US08793239B2 Method and system for form-filling crawl and associating rich keywords 有权
标题翻译：表单填充方法和系统抓取和关联丰富的关键字
公开(公告)号：US08793239B2
公开(公告)日：2014-07-29
申请号：US12576011
申请日：2009-10-08
申请人： Nilesh Dalvi , Raghu Ramakrishnan , Vinay Kakade , Arup Kumar Choudhury , Sathiya Keerthi Selvaraj , Philip Bohannon , Mani Abrol , David Ciemiewicz , Arun Shankar Iyer , Vipul Agarwal , Alok S. Kirpal
发明人： Nilesh Dalvi , Raghu Ramakrishnan , Vinay Kakade , Arup Kumar Choudhury , Sathiya Keerthi Selvaraj , Philip Bohannon , Mani Abrol , David Ciemiewicz , Arun Shankar Iyer , Vipul Agarwal , Alok S. Kirpal
IPC分类号： G06F17/30 , G06F7/00
CPC分类号： G06F17/30864
摘要： Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要翻译：提供了技术，用于有效地定位，处理和检索从通常可通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的网页获得的本地产品信息。在一个实施例中，诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。在找到合适的网页表单之后，执行编辑包装以创建自动化信息提取过程。使用自动信息提取器，执行深度网页抓取。执行单个业务记录的基于网格的提取，并且与业务列表数据库一起执行匹配和摄取。最后，元数据标签被添加到业务列表数据库中的条目。元数据标签也可以添加到其他数据库中的条目。

7. 发明申请

US20110113063A1 METHOD AND SYSTEM FOR BRAND NAME IDENTIFICATION 审中-公开
标题翻译：品牌名称识别方法与系统
公开(公告)号：US20110113063A1
公开(公告)日：2011-05-12
申请号：US12615243
申请日：2009-11-09
申请人： Bob Schulman , Sathiya Keerthi Selvaraj , Vinay Kakade , Mani Abrol , Amit Basu , Arun Shankar Iyer , Philip Bohannon
发明人： Bob Schulman , Sathiya Keerthi Selvaraj , Vinay Kakade , Mani Abrol , Amit Basu , Arun Shankar Iyer , Philip Bohannon
IPC分类号： G06F17/30
CPC分类号： G06F16/907
摘要： A method for identifying a brand name is described herein. The method involves obtaining category keywords associated with a category, designating a subgroup of the category keywords as brand name keywords for a particular brand name, receiving a search term, determining that the search term is a brand name keyword, and identifying the particular brand name corresponding to the brand name keyword.
摘要翻译：本文描述了用于识别品牌名称的方法。该方法包括获取与类别相关联的类别关键字，将类别关键字的子组指定为特定品牌名称的品牌关键字，接收搜索词，确定搜索词是品牌名称关键字，以及识别特定品牌名称对应品牌名称关键字。

8. 发明申请

US20110087646A1 Method and System for Form-Filling Crawl and Associating Rich Keywords 有权
标题翻译：填写查询和关联丰富关键字的方法和系统
公开(公告)号：US20110087646A1
公开(公告)日：2011-04-14
申请号：US12576011
申请日：2009-10-08
申请人： Nilesh Dalvi , Raghu Ramakrishnan , Vinay Kakade , Arup Kumar Choudhury , Sathiya Keerthi Selvaraj , Philip Bohannon , Mani Abrol , David Ciemiewicz , Arun Shankar Iyer , Vipul Agarwal , Alok S. Kirpal
发明人： Nilesh Dalvi , Raghu Ramakrishnan , Vinay Kakade , Arup Kumar Choudhury , Sathiya Keerthi Selvaraj , Philip Bohannon , Mani Abrol , David Ciemiewicz , Arun Shankar Iyer , Vipul Agarwal , Alok S. Kirpal
IPC分类号： G06F7/10 , G06F17/30
CPC分类号： G06F17/30864
摘要： Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.
摘要翻译：提供技术用于从通常通过提交到通常被称为“深”或“隐藏”网络的网页的表单查询的定位的网页获得的本地产品信息的有效定位，处理和检索。在一个实施例中，诸如产品信息和经销商位置信息的信息位于诸如经销商定位器形式的网页形式上。在找到合适的网页表单之后，执行编辑包装以创建自动化信息提取过程。使用自动信息提取器，执行深度网页抓取。执行单个业务记录的基于网格的提取，并且与业务列表数据库一起执行匹配和摄取。最后，元数据标签被添加到业务列表数据库中的条目。元数据标签也可以添加到其他数据库中的条目。

9. 发明授权

US08849790B2 Rapid iterative development of classifiers 有权
标题翻译：分类器的快速迭代开发
公开(公告)号：US08849790B2
公开(公告)日：2014-09-30
申请号：US12344132
申请日：2008-12-24
申请人： Kedar Bellare , Srujana Merugu , Sathiya Keerthi Selvaraj
发明人： Kedar Bellare , Srujana Merugu , Sathiya Keerthi Selvaraj
IPC分类号： G06F17/30
CPC分类号： G06F17/30265 , G06F17/3028
摘要： A classifier development process seamlessly and intelligently integrates different forms of human feedback on instances and features into the data preparation, learning and evaluation stages. A query utility based active learning approach is applicable to different types of editorial feedback. A bi-clustering based technique may be used to further speed up the active learning process.
摘要翻译：分类器开发过程将数据准备，学习和评估阶段的实例和特征的不同形式的人类反馈无缝智能地整合在一起。基于查询实用程序的主动学习方法适用于不同类型的编辑反馈。可以使用基于双聚类的技术来进一步加速主动学习过程。

10. 发明申请

US20110099131A1 PAIRWISE RANKING-BASED CLASSIFIER 有权
标题翻译：基于排序的分类器
公开(公告)号：US20110099131A1
公开(公告)日：2011-04-28
申请号：US12603763
申请日：2009-10-22
申请人： Sundararajan Sellamanickam , Sathiya Keerthi Selvaraj , Priyanka Garg
发明人： Sundararajan Sellamanickam , Sathiya Keerthi Selvaraj , Priyanka Garg
IPC分类号： G06F15/18 , G06N5/02
CPC分类号： G06N99/005 , G06F17/30707
摘要： The present invention provides methods and systems for binary classification of items. Methods and systems are provided for constructing a machine learning-based and pairwise ranking method-based classification model for binary classification of items as positive or negative with regard to a single class, based on training using a training set of examples including positive examples and unlabelled examples. The model includes only one hyperparameter and only one threshold parameter, which are selected to optimize the model with regard to constraining positive items to be classified as positive while minimizing a number of unlabelled items classified as positive.
摘要翻译：本发明提供了用于项目二进制分类的方法和系统。提供方法和系统，用于构建基于机器学习和成对排序方法的分类模型，对于单个类别的项目的二进制分类为正或负，基于使用包括正面示例和未标记的示例的训练集的训练例子。该模型仅包括一个超参数和仅一个阈值参数，其被选择以优化模型以限制正项目被分类为正，同时使被分类为阳性的未标记项目的数量最小化。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式