会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明公开
    • LEARNING MULTIMEDIA SEMANTICS FROM LARGE-SCALE UNSTRUCTURED DATA
    • LERNEN VON MULTIMEDIA-SEMANTIK AUS UMFASSENDEN UNSTRUKTURIERTERN DATEN
    • EP3138051A1
    • 2017-03-08
    • EP15721467.7
    • 2015-04-24
    • Microsoft Technology Licensing, LLC
    • HUA, Xian-ShengLI, JinUSHIKU, Yoshitaka
    • G06N99/00
    • G06F17/30705G06F17/30675G06F17/30864G06N99/005
    • Systems and methods for learning topic models from unstructured data and applying the learned topic models to recognize semantics for new data items are described herein. In at least one embodiment, a corpus of multimedia data items associated with a set of labels may be processed to generate a refined corpus of multimedia data items associated with the set of labels. Such processing may include arranging the multimedia data items in clusters based on similarities of extracted multimedia features and generating intra-cluster and inter-cluster features. The intra-cluster and the inter-cluster features may be used for removing multimedia data items from the corpus to generate the refined corpus. The refined corpus may be used for training topic models for identifying labels. The resulting models may be stored and subsequently used for identifying semantics of a multimedia data item input by a user.
    • 本文描述了用于从非结构化数据学习主题模型并应用所学习的主题模型以识别新数据项的语义的系统和方法。 在至少一个实施例中,可以处理与一组标签相关联的多媒体数据项的语料库以产生与该组标签相关联的多媒体数据项的精简语料库。 这种处理可以包括基于所提取的多媒体特征的相似性来排列多媒体数据项,并且生成集群内和集群间特征。 集群内和集群间特征可以用于从语料库中移除多媒体数据项以产生精炼的语料库。 精致的语料库可用于培训用于识别标签的主题模型。 所得到的模型可以被存储并随后用于识别由用户输入的多媒体数据项的语义。