专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US12033612B2 Speech synthesis method and apparatus, and readable storage medium 有权
公开(公告)号：US12033612B2
公开(公告)日：2024-07-09
申请号：US17984437
申请日：2022-11-10
申请人： TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
发明人： Yibin Zheng , Xinhui Li , Li Lu
IPC分类号： G10L13/02 , G10L19/04 , G10L21/043
CPC分类号： G10L13/02 , G10L19/04 , G10L21/043
摘要： A speech synthesis method includes: converting a text input sequence into a text feature representation sequence; inputting the text feature representation sequence into an encoder including N encoding layers; the N encoding layers including an encoding layer Ei and an encoding layer Ei+1; the encoding layer Ei+1 including a first multi-head self-attention network; acquiring a first attention matrix and a historical text encoded sequence outputted by the encoding layer Ei, and generating a second attention matrix of the encoding layer Ei+1 according to residual connection between the first attention matrix and the first multi-head self-attention network and the historical text encoded sequence; and generating a target text encoded sequence of the encoding layer Ei+1 according to the second attention matrix and the historical text encoded sequence, and generating synthesized speech data matched with the text input sequence based on the target text encoded sequence.

2. 发明授权

US09466289B2 Keyword detection with international phonetic alphabet by foreground model and background model 有权
标题翻译：用前景模型和背景模型对国际语音字母进行关键词检测
公开(公告)号：US09466289B2
公开(公告)日：2016-10-11
申请号：US14103775
申请日：2013-12-11
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Li Lu , Xiang Zhang , Shuai Yue , Feng Rao , Eryu Wang , Lu Li
IPC分类号： G10L15/06 , G10L15/08
CPC分类号： G10L15/063 , G10L15/08 , G10L2015/088
摘要： An electronic device with one or more processors and memory trains an acoustic model with an international phonetic alphabet (IPA) phoneme mapping collection and audio samples in different languages, where the acoustic model includes: a foreground model; and a background model. The device generates a phone decoder based on the trained acoustic model. The device collects keyword audio samples, decodes the keyword audio samples with the phone decoder to generate phoneme sequence candidates, and selects a keyword phoneme sequence from the phoneme sequence candidates. After obtaining the keyword phoneme sequence, the device detects one or more keywords in an input audio signal with the trained acoustic model, including: matching phonemic keyword portions of the input audio signal with phonemes in the keyword phoneme sequence with the foreground model; and filtering out phonemic non-keyword portions of the input audio signal with the background model.
摘要翻译：具有一个或多个处理器和存储器的电子设备具有使用不同语言的国际语音字母（IPA）音素映射收集和音频样本的声学模型，其中声学模型包括：前景模型; 和背景模型。该设备基于经过训练的声学模型生成电话解码器。设备收集关键字音频样本，用手机解码器解码关键词音频样本，以产生音素序列候选，并从音素序列候选中选择关键词音素序列。在获得关键字音素序列之后，设备利用经训练的声学模型检测输入音频信号中的一个或多个关键词，包括：使用前景模型将关键字音素序列中的输入音频信号的音素关键词部分与音素相匹配; 并用背景模型滤出输入音频信号的音素非关键字部分。

3. 发明授权

US09230541B2 Keyword detection for speech recognition 有权
标题翻译：语音识别的关键字检测
公开(公告)号：US09230541B2
公开(公告)日：2016-01-05
申请号：US14567969
申请日：2014-12-11
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Lu Ll , Li Lu , Jianxiong Ma , Linghui Kong , Feng Rao , Shuai Yue , Xiang Zhang , Haibo Liu , Eryu Wang , Bo Chen
IPC分类号： G10L15/08
CPC分类号： G10L15/08 , G10L15/083 , G10L2015/088
摘要： This application discloses a method implemented of recognizing a keyword in a speech that includes a sequence of audio frames further including a current frame and a subsequent frame. A candidate keyword is determined for the current frame using a decoding network that includes keywords and filler words of multiple languages, and used to determine a confidence score for the audio frame sequence. A word option is also determined for the subsequent frame based on the decoding network, and when the candidate keyword and the word option are associated with two distinct types of languages, the confidence score of the audio frame sequence is updated at least based on a penalty factor associated with the two distinct types of languages. The audio frame sequence is then determined to include both the candidate keyword and the word option by evaluating the updated confidence score according to a keyword determination criterion.
摘要翻译：本申请公开了一种实现的方法，其中识别语音中的关键字，其中包括进一步包括当前帧和后续帧的音频帧序列。使用包括多种语言的关键词和填充词的解码网络为当前帧确定候选关键字，并且用于确定音频帧序列的置信度分数。还基于解码网络为后续帧确定字选项，并且当候选关键词和词选项与两种不同类型的语言相关联时，至少基于惩罚来更新音频帧序列的置信度得分与两种不同类型语言相关联的因素。然后通过根据关键字确定标准评估更新的可信度得分，确定音频帧序列以包括候选关键词和词选项。

4. 外观设计

USD746319S1 Portion of a display screen for a graphical user interface 有权
标题翻译：用于图形用户界面的显示屏部分
公开(公告)号：USD746319S1
公开(公告)日：2015-12-29
申请号：US29473695
申请日：2013-11-25
申请人： Tencent Technology (Shenzhen) Company Limited
设计人： Cheng Zhang , Jingke Leng , Lei Qin , Li Lu

5. 发明授权

US09818432B2 Method and computer system for performing audio search on a social networking platform 有权
公开(公告)号：US09818432B2
公开(公告)日：2017-11-14
申请号：US15176047
申请日：2016-06-07
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Lu Li , Jianxiong Ma , Li Lu
IPC分类号： G10L15/26 , G10L25/54 , G06F17/30 , G10L15/14 , G10L21/10 , G10L15/02 , G10L15/08
CPC分类号： G10L25/54 , G06F17/30026 , G10L15/14 , G10L21/10 , G10L2015/027 , G10L2015/088
摘要： Methods and computer systems for audio search on a social networking platform are disclosed. The method includes: while running a social networking application, receiving a first audio input from a user of the computer system, the first audio input including one or more search keywords; generating a first audio confusion network from the first audio input; determining whether the first audio confusion network matches at least one of one or more second audio confusion networks, wherein a respective second audio confusion network was generated from a corresponding second audio input associated with a chat session of which the user is a participant; and identifying a second audio input corresponding to the at least one second audio confusion network that matches the first audio confusion network, wherein the identified second audio input includes the one or more search keywords that are included in the first audio input.

6. 发明授权

US09811517B2 Method and system of adding punctuation and establishing language model using a punctuation weighting applied to chinese speech recognized text 有权
公开(公告)号：US09811517B2
公开(公告)日：2017-11-07
申请号：US14148579
申请日：2014-01-06
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Haibo Liu , Eryu Wang , Xiang Zhang , Li Lu , Shuai Yue , Qiuge Liu , Bo Chen , Jian Liu , Lu Li
IPC分类号： G06F17/27 , G06F17/28 , G10L15/00 , G10L15/26
CPC分类号： G06F17/273 , G06F17/2775 , G06F17/2785 , G06F17/289 , G10L15/265
摘要： A method of processing information content based on a Chinese language model is performed at a computer, the method including: identifying a plurality of expressions in the information content extracted from a speech input through speech recognition that is queued to be processed; dividing the expressions into a plurality of characteristic units according to semantic features and predetermined characteristics associated with each characteristic unit, each including a subset of the expressions and the predetermined characteristics at least including a respective integer number of expressions that are included in the characteristic unit; extracting, from the Chinese language model, a plurality of probabilities for punctuation marks associated with each characteristic unit; and in accordance with the probabilities, associating a respective punctuation mark with each characteristic unit included in the information content. The method further comprises adding punctuation marks based on a weight determined for each punctuation mark.

7. 发明授权

US09754581B2 Reminder setting method and apparatus 有权
公开(公告)号：US09754581B2
公开(公告)日：2017-09-05
申请号：US13903593
申请日：2013-05-28
申请人： TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
发明人： Li Lu , Feng Rao , Song Liu , Zongyao Tang , Xiang Zhang , Shuai Yue , Bo Chen
IPC分类号： G10L15/00 , G06Q50/00 , G10L15/08 , G10L15/26 , G06Q10/10
CPC分类号： G10L15/08 , G06Q10/1097 , G10L15/26 , G10L2015/088
摘要： The present invention, pertaining to the field of speech recognition, discloses a reminder setting method and apparatus. The method includes: acquiring speech signals; acquiring time information in speech signals by using keyword recognition, and determining reminder time for reminder setting according to the time information; acquiring text sequence corresponding to the speech signals by using continuous speech recognition, and determining reminder content for reminder setting according to the time information and the text sequence; and setting a reminder according to the reminder time and the reminder content. According to the present invention, acquiring time information in speech signals by using keyword recognition ensures correctness of time information extraction, and achieves an effect that correct time information is still acquired by keyword recognition to set a reminder even in the case that a recognized text sequence is incorrect due to poor precision in whole text recognition in the speech recognition.

8. 发明授权

US09697821B2 Method and system for building a topic specific language model for use in automatic speech recognition 有权
公开(公告)号：US09697821B2
公开(公告)日：2017-07-04
申请号：US14108223
申请日：2013-12-16
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Feng Rao , Li Lu , Bo Chen , Shuai Yue , Xiang Zhang , Eryu Wang , Dadong Xie , Lou Li , Duling Lu
IPC分类号： G10L15/06 , G10L15/183 , G10L15/197 , G10L15/26
CPC分类号： G10L15/063 , G10L15/183 , G10L15/197 , G10L15/26
摘要： An automatic speech recognition method includes at a computer having one or more processors and memory for storing one or more programs to be executed by the processors, obtaining a plurality of speech corpus categories through classifying and calculating raw speech corpus; obtaining a plurality of classified language models that respectively correspond to the plurality of speech corpus categories through a language model training applied on each speech corpus category; obtaining an interpolation language model through implementing a weighted interpolation on each classified language model and merging the interpolated plurality of classified language models; constructing a decoding resource in accordance with an acoustic model and the interpolation language model; and decoding input speech using the decoding resource, and outputting a character string with a highest probability as a recognition result of the input speech.

9. 发明申请

US20160086609A1 SYSTEMS AND METHODS FOR AUDIO COMMAND RECOGNITION 有权
标题翻译：用于音频命令识别的系统和方法
公开(公告)号：US20160086609A1
公开(公告)日：2016-03-24
申请号：US14958606
申请日：2015-12-03
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Shuai Yue , Xiang Zhang , Li Lu , Feng Rao , Eryu Wang , Haibo Liu , Bo Chen , Jian Liu , Lu Li
IPC分类号： G10L17/24 , G10L15/22 , G10L17/16 , G06F3/16 , G10L17/26
CPC分类号： G10L17/24 , G06F3/167 , G10L15/22 , G10L17/02 , G10L17/16 , G10L17/26 , G10L2015/223
摘要： The present application discloses a method, an electronic system and a non-transitory computer readable storage medium for recognizing audio commands in an electronic device. The electronic device obtains audio data based on an audio signal provided by a user and extracts characteristic audio fingerprint features from the audio data. The electronic device further determines whether the corresponding audio signal is generated by an authorized user by comparing the characteristic audio fingerprint features with an audio fingerprint model for the authorized user and with a universal background model that represents user-independent audio fingerprint features, respectively. When the corresponding audio signal is generated by the authorized user of the electronic device, an audio command is extracted from the audio data, and an operation is performed according to the audio command.
摘要翻译：本申请公开了一种用于识别电子设备中的音频命令的方法，电子系统和非暂时性计算机可读存储介质。电子设备基于由用户提供的音频信号获得音频数据，并从音频数据中提取特征音频指纹特征。电子设备还通过将特征音频指纹特征与用于授权用户的音频指纹模型进行比较，以及分别表示用户独立的音频指纹特征的通用背景模型来确定对应的音频信号是否由授权用户产生。当由电子设备的授权用户产生相应的音频信号时，从音频数据中提取音频命令，并根据音频命令进行操作。

10. 发明授权

US09177131B2 User authentication method and apparatus based on audio and video data 有权
标题翻译：基于音频和视频数据的用户认证方法和设备
公开(公告)号：US09177131B2
公开(公告)日：2015-11-03
申请号：US14262665
申请日：2014-04-25
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Xiang Zhang , Li Lu , Eryu Wang , Shuai Yue , Feng Rao , Haibo Liu , Lou Li , Duling Lu , Bo Chen
IPC分类号： H04L29/06 , G06F21/32
CPC分类号： G06F21/32 , G06F2221/2117
摘要： A computer-implemented method is performed at a server having one or more processors and memory storing programs executed by the one or more processors for authenticating a user from video and audio data. The method includes: receiving a login request from a mobile device, the login request including video data and audio data; extracting a group of facial features from the video data; extracting a group of audio features from the audio data and recognizing a sequence of words in the audio data; identifying a first user account whose respective facial features match the group of facial features and a second user account whose respective audio features match the group of audio features. If the first user account is the same as the second user account, retrieve the sequence of words associated with the user account and compare the sequences of words for authentication purpose.
摘要翻译：在具有一个或多个处理器的服务器和由一个或多个处理器执行的用于从视频和音频数据认证用户的存储器存储程序的服务器执行计算机实现的方法。该方法包括：从移动设备接收登录请求，登录请求包括视频数据和音频数据; 从视频数据中提取一组面部特征; 从音频数据提取一组音频特征并识别音频数据中的单词序列; 识别其各自的面部特征与该组面部特征相匹配的第一用户帐户和其各个音频特征与该组音频特征相匹配的第二用户帐户。如果第一个用户帐户与第二个用户帐户相同，则检索与用户帐户相关联的单词序列，并比较用于验证目的的单词序列。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式