会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • A SYSTEM FOR ESTIMATING A DISTRIBUTION OF MESSAGE CONTENT CATEGORIES IN SOURCE DATA
    • 用于估计消息来源信息内容分类的系统
    • WO2008115519A1
    • 2008-09-25
    • PCT/US2008/003606
    • 2008-03-19
    • PRESIDENT AND FELLOWS OF HARVARD COLLEGEKING, GaryHOPKINS, DanielLU, Ying
    • KING, GaryHOPKINS, DanielLU, Ying
    • G06F19/00
    • G06F17/30705G06K9/6256G06K9/6277G06N99/005
    • A method of computerized content analysis that gives "approximately unbiased and statistically consistent estimates" of a distribution of elements of structured, unstructured, and partially structured source data among a set of categories. In one embodiment, this is done by analyzing a distribution of small set of individually-classified elements in a plurality of categories and then using the information determined from the analysis to extrapolate a distribution in a larger population set. This extrapolation is performed without constraining the distribution of the unlabeled elements to be equal to the distribution of labeled elements, nor constraining a content distribution of content of elements in the labeled set (e.g., a distribution of words used by elements in the labeled set) to be equal to a content distribution of elements in the unlabeled set. Not being constrained in these ways allows the estimation techniques described herein to provide distinct advantages over conventional aggregation techniques.
    • 一种计算机内容分析的方法,其给出了在一组类别中的结构化,非结构化和部分结构化的源数据的元素的分布的“大致无偏差和统计上一致的估计”。 在一个实施例中,这通过分析多个类别中的小组单独分类的元素的分布,然后使用从分析确定的信息来推断更大群体集合中的分布来完成。 执行该外推,而不限制未标记元素的分布等于标记元素的分布,也不限制标记集合中元素的内容分布(例如,标记集合中的元素使用的词的分布) 等于未标记集合中元素的内容分布。 不以这些方式被约束允许本文描述的估计技术提供与常规聚合技术相比的明显优点。