会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • METHOD AND DEVICE FOR EXPANDING DATA OF BILINGUAL CORPUS, AND STORAGE MEDIUM
    • 用于扩展双胞胎数据的方法和装置以及存储介质
    • US20160239481A1
    • 2016-08-18
    • US14892933
    • 2014-09-04
    • BEIJING BAIDU NETCOM SCIENCE AND TECHNOLOGY CO., LTD.
    • Xiaoning ZhuZhongjun HeHua WuHaifeng Wang
    • G06F17/27G06F17/30G06F17/28
    • G06F17/2735G06F17/2827G06F17/2845G06F17/3043G06F17/30489G06F17/30654G06F17/30669
    • Disclosed are a method and a device for expanding data of a bilingual corpus. The method for expanding data of a bilingual corpus includes: searching, in a source language-pivot language corpus, for at least one first pivot language phrase semantically matching a first source language phrase; searching, in the source language-pivot language corpus, for at least one second source language phrase semantically matching each of the first pivot language phrases to form a source language phrase set by the second source language phrases; searching, in a pivot language-target language corpus, for at least one first target language phrase semantically matching each of the first pivot language phrases to form a target language phrase set by the first target language phrases; combining the second source language phrases in the source language phrase set with the first target language phrases in the target language phrase set, so as to form at least one phrase pair in which a source language phrase and a target language phrase semantically match; and storing the formed at least one phrase pair in which the source language phrase and the target language phrase semantically match into a source language-target language corpus. Data in a bilingual corpus is expanded, so that the problem of data sparseness in the bilingual corpus is solved.
    • 公开了一种用于扩展双语语料库数据的方法和装置。 用于扩展双语语料库的数据的方法包括:在源语言 - 枢轴语言语料库中搜索语义上匹配第一源语言短语的至少一个第一枢轴语言短语; 在源语言 - 枢轴语言语料库中搜索至少一个第二源语言短语,语义上匹配每个第一枢轴语言短语以形成由第二源语言短语设置的源语言短语; 在枢轴语言目标语言语料库中搜索至少一个第一目标语言短语,语义上匹配每个第一枢轴语言短语以形成由第一目标语言短语设置的目标语言短语; 将源语言短语集合中的第二源语言短语与目标语言短语集合中的第一目标语言短语组合,以形成源语言短语和目标语言短语在语义上匹配的至少一个短语对; 并且将所形成的至少一个短语对存储在源语言短语和目标语言短语语义匹配中到源语言目标语言语料库中。 双语语料库中的数据扩展,双语语料库数据稀疏问题得到解决。