会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 41. 发明申请
    • SYSTEMS, METHODS AND DEVICES FOR EXTRACTING AND VISUALIZING USER-CENTRIC COMMUNITIES FROM EMAILS
    • 从电子邮件中提取和查看用户中心社区的系统,方法和设备
    • US20120284381A1
    • 2012-11-08
    • US13099502
    • 2011-05-03
    • Boris Chidlovskii
    • Boris Chidlovskii
    • G06F15/173
    • G06Q10/10H04L51/32
    • Embodiments generally relate to systems and methods for extracting and visualizing user-centric communities from emails. A set of email data comprising a set of users can be identified and a communication graph comprising a center node can be generated from the email data. The center node can be removed from the communication graph and a set of communities can be determined from the remaining data. The center node can be reconnected to a center of each of the set of communities to form a community graph. The links connecting the center node with the center of each of the set of communities can have a weight calculated according to a formula. The community graph can be visualized and provided to an administrator.
    • 实施例通常涉及用于从电子邮件中提取和可视化以用户为中心的社区的系统和方法。 可以识别包括一组用户的一组电子邮件数据,并且可以从电子邮件数据生成包括中心节点的通信图。 可以从通信图中删除中心节点,并且可以从剩余数据确定一组社区。 中心节点可以重新连接到每组社区的中心,以形成社区图。 连接中心节点与每组群体的中心的链接可以具有根据公式计算的权重。 社区图可以被可视化并提供给管理员。
    • 42. 发明申请
    • LOCAL METRIC LEARNING FOR TAG RECOMMENDATION IN SOCIAL NETWORKS
    • 在社会网络中进行标签推荐的本地学习方法
    • US20120219191A1
    • 2012-08-30
    • US13036209
    • 2011-02-28
    • Mohamed Aymen BenzartiBoris ChidlovskiiNishant Vijayakumar
    • Mohamed Aymen BenzartiBoris ChidlovskiiNishant Vijayakumar
    • G06K9/00G06F15/16
    • G06Q30/0201G06K9/00677G06Q50/01
    • A tag recommendation for an item to be tagged is generated by: selecting a set of candidate neighboring items in an electronic social network based on context of items in the electronic social network respective to an owner of the item to be tagged; selecting a set of nearest neighboring items from the set of candidate neighboring items based on distances of the candidate neighboring items from the item to be tagged as measured by an item comparison metric; and selecting at least one tag recommendation based on tags of the items of the set of nearest neighboring items. The item comparison metric may comprise a Mahalanobis distance metric trained on the set of candidate neighboring items to correlate the trained Mahalanobis distance between pairs of items of the set of candidate neighboring items with an overlap metric indicative of overlap of the tag sets of the two items.
    • 通过以下方式生成要标记的商品的标签推荐:基于与要标记的商品的所有者相关联的电子社交网络中的项目的上下文来选择电子社交网络中的一组候选邻居项目; 基于通过项目比较度量测量的候选相邻项目与要标记的项目的距离,从所述候选相邻项目集合中选择一组最近邻项目; 以及基于所述一组最近邻项目的项目的标签来选择至少一个标签推荐。 项目比较度量可以包括在所述候选相邻项目的集合上训练的马哈拉诺比斯距离度量,以将所述候选相邻项目组中的项目对之间的训练马哈拉诺比斯距离与指示两个项目的标签组的重叠的重叠度量相关联 。
    • 43. 发明申请
    • Semi-supervised visual clustering
    • 半监督视觉聚类
    • US20090018995A1
    • 2009-01-15
    • US11827770
    • 2007-07-13
    • Boris ChidlovskiiLoic Lecerf
    • Boris ChidlovskiiLoic Lecerf
    • G06F17/30
    • G06K9/6253G06K9/622G06K9/6251
    • A clustering system includes a visual mapping sub-system configured to display an N-dimensional to two- or three-dimensional mapping of items to be clustered, where N is greater than three, the mapping having mapping parameters for the N-dimensions. A user interface sub-system is configured to receive user inputted values for the mapping parameters, user inputted values selecting whether selected mapping parameters are fixed or adjustable, and user inputted values associating selected items with selected groups. An adjustment sub-system is configured to adjust the adjustable mapping parameters, without adjusting any fixed mapping parameters, to improve a measure of distinctness of one or more groups of items in the two- or three-dimensional mapping.
    • 聚类系统包括视觉映射子系统,被配置为显示要聚类的项目的N维到二维或三维映射,其中N大于3,所述映射具有N维的映射参数。 用户接口子系统被配置为接收用于映射参数的用户输入值,用户输入的值选择所选择的映射参数是固定的或可调整的,以及用户输入的值将所选择的项目与所选择的组相关联。 调整子系统被配置为在不调整任何固定的映射参数的情况下调整可调整的映射参数,以改进二维或三维映射中的一组或多组项目的区分度量。
    • 44. 发明授权
    • Method for automatic wrapper repair
    • 自动包装修复方法
    • US07440974B2
    • 2008-10-21
    • US11295367
    • 2005-12-05
    • Boris Chidlovskii
    • Boris Chidlovskii
    • G06F17/30
    • G06F17/30893Y10S707/99931Y10S707/99932Y10S707/99942Y10S707/99945Y10S707/99948
    • A method of information extraction from a Web page using an initial wrapper which has become partially inoperative, wherein the initial wrapper comprises an initial set of rules for extracting information and for assigning labels from a wrapper set of labels to the extracted information, includes using the initial set of rules to extract strings from the Web page parsed in forward direction; analyzing the extracted strings according to the initial set of rules for assigning labels associated with the wrapper; assigning labels to those strings which satisfy the label rules; using the initial set of rules to extract strings from the Web page in backward/(opposite) direction; analyzing the extracted strings according to the set of rules for assigning labels associated with the wrappers; and assigning labels to those unlabeled strings from which satisfy the label rules.
    • 一种使用已经变得部分不起作用的初始包装器从网页提取信息的方法,其中初始包装器包括用于提取信息并从包装纸标签组分配标签到提取的信息的初始规则集,包括使用 从向前解析的网页中提取字符串的初始规则集; 根据用于分配与包装器相关联的标签的初始规则集来分析提取的字符串; 为满足标签规则的字符串分配标签; 使用初始规则集在向后/(相反)方向从网页提取字符串; 根据用于分配与包装纸相关联的标签的规则集来分析提取的字符串; 并将标签分配给满足标签规则的那些未标记的字符串。
    • 45. 发明申请
    • Active learning methods for evolving a classifier
    • 用于演化分类器的主动学习方法
    • US20080147574A1
    • 2008-06-19
    • US11638732
    • 2006-12-14
    • Boris Chidlovskii
    • Boris Chidlovskii
    • G06F15/18
    • G06F17/30705G06N99/005
    • A method and system are provided for classifying data items such as a document based upon identification of element instances within the data item. A training set of classes is provided where each class is associated with one or more features indicative of accurate identification of an element instance within the data item. Upon the identification of the data item with the training set, a confidence factor is computed that the selected element instance is accurately identified. When a selected element instance has a low confidence factor, the associated features for the predicted class are changed by an annotator/expert so that the changed class definition of the new associated feature provides a higher confidence factor of accurate identification of element instances within the data item.
    • 提供了一种方法和系统,用于基于数据项内的元素实例的识别来对诸如文档的数据项进行分类。 提供了一组训练集,其中每个类与指示数据项内的元素实例的准确识别的一个或多个特征相关联。 在用训练集识别数据项后,计算出所选择的元素实例被准确地识别的置信因子。 当所选择的元素实例具有低置信因子时,由注释器/专家改变预测类的相关特征,使得新关联特征的改变的类定义提供了数据内元素实例的精确识别的更高置信因子 项目。
    • 46. 发明申请
    • Document alignment systems for legacy document conversions
    • 用于旧文档转换的文档对齐系统
    • US20070150443A1
    • 2007-06-28
    • US11315458
    • 2005-12-22
    • Andre BergholzBoris Chidlovskii
    • Andre BergholzBoris Chidlovskii
    • G06F17/30
    • G06F17/30569
    • A method for aligning documents which may be in different XML formats includes inputting source and target leaves of a source and documents in first and second tree structured formats and assigning a cost to each of a plurality of matches. Each match may include a source leaf and a target leaf or be an unmatched source or target leaf. Matches are identified for which a total cost is minimal, wherein each of the leaves is in at least one of the identified matches. From the identified matches, groups of two or more matches are identified which have a leaf in common. From the groups, probable matches are identified in which more that one target leaf is matched with at least one source leaf or more than one source leaf is matched with a target leaf. An alignment between leaves of the target document and leaves of the source document is output which includes the probable matches.
    • 用于对准可以具有不同XML格式的文档的方法包括以第一和第二树结构格式输入源和文档的源和目标叶,并为多个匹配中的每一个分配成本。 每个匹配可以包括源叶和目标叶,或者是不匹配的源或目标叶。 识别匹配,其总成本最小,其中每个叶片处于所识别的匹配中的至少一个中。 从识别的匹配中,识别出具有共同叶的两个或更多个匹配的组。 从组中鉴定出可能的匹配,其中更多的一个目标叶与至少一个源叶匹配或多于一个源叶与目标叶匹配。 输出目标文件的叶子和源文档的叶片之间的对齐,其包括可能的匹配。
    • 48. 发明申请
    • Method for classifying sub-trees in semi-structured documents
    • 在半结构化文件中分类子树的方法
    • US20060288275A1
    • 2006-12-21
    • US11156776
    • 2005-06-20
    • Boris ChidlovskiiJerome Fuselier
    • Boris ChidlovskiiJerome Fuselier
    • G06F17/00
    • G06F16/83
    • A method and system for classifying semi-structured documents by distinguishing sub-tree structural information as a distinct representative characteristic of a fragment of the document structure identified by a sub-tree node therein. The structural information comprises both an inner structure and an outer structure which individually can be exploited as representative data in a probabilistic classifier for classifying the sub-tree itself or the entire document. Additional representative feature data can also be independently used for classification and comprises the data content of the fragment structurally represented by the sub-tree and additionally with node attributes. The classification values independently generated from each of the different sets of features can then be combined in an assembly classifier to generate an automated classification system.
    • 通过将子树结构信息区分为由其中的子树节点识别的文档结构的片段的不同代表特征来对半结构化文档进行分类的方法和系统。 结构信息包括内部结构和外部结构,其可以单独地作为用于对子树本身或整个文档进行分类的概率分类器中的代表性数据。 附加的代表性特征数据也可以独立地用于分类,并且包括由子树结构地表示的片段的数据内容,并且还包括节点属性。 然后可以将每个不同特征集合独立地生成的分类值组合在组件分类器中以生成自动分类系统。
    • 50. 发明授权
    • System and method of automatic wrapper grammar generation
    • 自动包装语法生成的系统和方法
    • US06792576B1
    • 2004-09-14
    • US09361496
    • 1999-07-26
    • Boris Chidlovskii
    • Boris Chidlovskii
    • G06F1500
    • G06F17/30569
    • A method for generating a wrapper grammar for a file having a structure of a particular format includes providing at least one sample file of the particular format, where the particular format comprises a plurality of string tokens. Each sample file includes a plurality of tokens (data strings) which may be actual data from the document, an HTML tag or some other grammatical separator. The sample file of the particular format is then processed by annotating attributable tokens with a user-defined attribute, such as Author, Title, etc. from a set of attributes to form an annotated sample set. The annotated sample set is then evaluated to determine if wrapper grammar generation is possible, and if it is possible, a wrapper grammar for the files having a structure of the particular format is generated. Preferably, the annotated sample set is evaluated by determining if all attributes in the annotated sample set are distinguishable from one another.
    • 用于为具有特定格式的结构的文件生成包装器语法的方法包括提供特定格式的至少一个样本文件,其中特定格式包括多个字符串令牌。 每个样本文件包括可以是来自文档的实际数据,HTML标签或其他语法分隔符的多个令牌(数据串)。 然后通过从一组属性中用用户定义的属性(例如作者,标题等)注释可归属令牌来处理特定格式的样本文件,以形成注释样本集。 然后评估注释样本集,以确定是否可能进行封装语法生成,并且如果可能,则生成具有特定格式的结构的文件的包装器语法。 优选地,通过确定注释样本集中的所有属性是否可彼此区分来评估注释样本集。