会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Method for automatically identifying sentence boundaries in noisy conversational data
    • 在嘈杂会话数据中自动识别句子边界的方法
    • US08364485B2
    • 2013-01-29
    • US11845462
    • 2007-08-27
    • Tetsuya NasukawaDiwakar PunjaniShourya RoyL. Venkata SubramaniamHironori Takeuchi
    • Tetsuya NasukawaDiwakar PunjaniShourya RoyL. Venkata SubramaniamHironori Takeuchi
    • G10L15/04
    • G10L15/26
    • Sentence boundaries in noisy conversational transcription data are automatically identified. Noise and transcription symbols are removed, and a training set is formed with sentence boundaries marked based on long silences or on manual markings in the transcribed data. Frequencies of head and tail n-grams that occur at the beginning and ending of sentences are determined from the training set. N-grams that occur a significant number of times in the middle of sentences in relation to their occurrences at the beginning or ending of sentences are filtered out. A boundary is marked before every head n-gram and after every tail n-gram occurring in the conversational data and remaining after filtering. Turns are identified. A boundary is marked after each turn, unless the turn ends with an impermissible tail word or is an incomplete turn. The marked boundaries in the conversational data identify sentence boundaries.
    • 嘈杂会话转录数据中的句子边界自动识别。 删除噪声和转录符号,并且形成一个训练集,其中以基于长期沉默或手写标记的转录数据标记的句子边界。 从训练集确定在句子的开头和结尾出现的头和尾n-gram的频率。 在句子中间出现相当于句子开头或结尾的出现次数的N-gram被过滤掉。 在每个头n-gram之前和之后的每个尾部n-gram出现在对话数据中并且在过滤之后保留边界。 确认车辙。 每转后,边界都会被标记出来,除非转弯以不允许的尾字结束,或者是不完整的转弯。 会话数据中的标记边界识别句子边界。
    • 2. 发明申请
    • METHOD FOR AUTOMATICALLY IDENTIFYING SENTENCE BOUNDARIES IN NOISY CONVERSATIONAL DATA
    • 自动识别语音对话数据中的声界边界的方法
    • US20090063150A1
    • 2009-03-05
    • US11845462
    • 2007-08-27
    • Tetsuya NasukawaDiwakar PunjaniShourya RoyL. Venkata SubramaniamHironori Takeuchi
    • Tetsuya NasukawaDiwakar PunjaniShourya RoyL. Venkata SubramaniamHironori Takeuchi
    • G10L15/04
    • G10L15/26
    • Sentence boundaries in noisy conversational transcription data are automatically identified. Noise and transcription symbols are removed, and a training set is formed with sentence boundaries marked based on long silences or on manual markings in the transcribed data. Frequencies of head and tail n-grams that occur at the beginning and ending of sentences are determined from the training set. N-grams that occur a significant number of times in the middle of sentences in relation to their occurrences at the beginning or ending of sentences are filtered out. A boundary is marked before every head n-gram and after every tail n-gram occurring in the conversational data and remaining after filtering. Turns are identified. A boundary is marked after each turn, unless the turn ends with an impermissible tail word or is an incomplete turn. The marked boundaries in the conversational data identify sentence boundaries.
    • 嘈杂会话转录数据中的句子边界自动识别。 删除噪声和转录符号,并且形成一个训练集,其中以基于长期沉默或手写标记的转录数据标记的句子边界。 从训练集确定在句子的开头和结尾出现的头和尾n-gram的频率。 在句子中间出现相当于句子开头或结尾的出现次数的N-gram被过滤掉。 在每个头n-gram之前和之后的每个尾部n-gram出现在对话数据中并且在过滤之后保留边界。 确认车辙。 每转后,边界都会被标记出来,除非转弯以不允许的尾字结束,或者是不完整的转弯。 会话数据中的标记边界识别句子边界。