专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20150019214A1 METHOD AND DEVICE FOR PARALLEL PROCESSING IN MODEL TRAINING 有权
标题翻译：模拟训练中并行处理的方法和装置
公开(公告)号：US20150019214A1
公开(公告)日：2015-01-15
申请号：US14108237
申请日：2013-12-16
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Eryu WANG , Li LU , Xiang ZHANG , Haibo LIU , Feng RAO , Lou LI , Shuai YUE , Bo CHEN
IPC分类号： G10L15/34 , G10L25/30 , G10L15/06
CPC分类号： G10L15/34 , G06N3/02 , G10L15/063 , G10L15/16
摘要： A method and a device for training a DNN model includes: at a device including one or more processors and memory: establishing an initial DNN model; dividing a training data corpus into a plurality of disjoint data subsets; for each of the plurality of disjoint data subsets, providing the data subset to a respective training processing unit of a plurality of training processing units operating in parallel, wherein the respective training processing unit applies a Stochastic Gradient Descent (SGD) process to update the initial DNN model to generate a respective DNN sub-model based on the data subset; and merging the respective DNN sub-models generated by the plurality of training processing units to obtain an intermediate DNN model, wherein the intermediate DNN model is established as either the initial DNN model for a next training iteration or a final DNN model in accordance with a preset convergence condition.
摘要翻译：用于训练DNN模型的方法和设备包括：在包括一个或多个处理器和存储器的设备上：建立初始DNN模型; 将训练数据语料库划分为多个不相交的数据子集; 对于多个不相交的数据子集中的每一个，将数据子集提供给并行操作的多个训练处理单元的相应训练处理单元，其中相应的训练处理单元应用随机梯度下降（SGD）过程来更新初始 DNN模型基于数据子集生成相应的DNN子模型; 并且合并由多个训练处理单元生成的各个DNN子模型，以获得中间DNN模型，其中中间DNN模型被建立为用于下一个训练迭代的初始DNN模型或根据下面的训练迭代的最终DNN模型预设收敛条件。

2. 发明申请

US20140214416A1 METHOD AND SYSTEM FOR RECOGNIZING SPEECH COMMANDS 有权
标题翻译：用于识别语音命令的方法和系统
公开(公告)号：US20140214416A1
公开(公告)日：2014-07-31
申请号：US14106634
申请日：2013-12-13
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Shuai YUE , Li LU , Xiang ZHANG , Dadong XIE , Haibo LIU , Bo CHEN , Jian LIU
IPC分类号： G10L15/22
CPC分类号： G10L15/14 , G10L15/063 , G10L15/083 , G10L15/32 , G10L2015/088 , G10L2015/223
摘要： A method of recognizing speech commands includes generating a background acoustic model for a sound using a first sound sample, the background acoustic model characterized by a first precision metric. A foreground acoustic model is generated for the sound using a second sound sample, the foreground acoustic model characterized by a second precision metric. A third sound sample is received and decoded by assigning a weight to the third sound sample corresponding to a probability that the sound sample originated in a foreground using the foreground acoustic model and the background acoustic model. The method further includes determining if the weight meets predefined criteria for assigning the third sound sample to the foreground and, when the weight meets the predefined criteria, interpreting the third sound sample as a portion of a speech command. Otherwise, recognition of the third sound sample as a portion of a speech command is forgone.
摘要翻译：识别语音命令的方法包括使用第一声音样本产生用于声音的背景声学模型，所述背景声学模型由第一精度度量表征。使用第二声音样本为声音生成前景声学模型，前景声学模型以第二精度度量为特征。通过使用前景声学模型和背景声学模型通过对与声音样本始发于前景的概率相对应的第三声音样本分配权重来接收和解码第三声音样本。所述方法还包括确定所述权重是否满足用于将所述第三声音样本分配给前景的预定准则，并且当所述权重满足所述预定标准时，将所述第三声音样本解释为语音命令的一部分。否则，放弃了作为语音命令的一部分的第三声音样本的识别。

3. 发明申请

US20140237576A1 USER AUTHENTICATION METHOD AND APPARATUS BASED ON AUDIO AND VIDEO DATA 有权
标题翻译：基于音频和视频数据的用户认证方法和设备
公开(公告)号：US20140237576A1
公开(公告)日：2014-08-21
申请号：US14262665
申请日：2014-04-25
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Xiang ZHANG , Li LU , Eryu WANG , Shuai YUE , Feng RAO , Haibo LlU , Lou LI , Duling LU , Bo CHEN
IPC分类号： G06F21/32
CPC分类号： G06F21/32 , G06F2221/2117
摘要： A computer-implemented method is performed at a server having one or more processors and memory storing programs executed by the one or more processors for authenticating a user from video and audio data. The method includes: receiving a login request from a mobile device, the login request including video data and audio data; extracting a group of facial features from the video data; extracting a group of audio features from the audio data and recognizing a sequence of words in the audio data; identifying a first user account whose respective facial features match the group of facial features and a second user account whose respective audio features match the group of audio features. If the first user account is the same as the second user account, retrieve the sequence of words associated with the user account and compare the sequences of words for authentication purpose.
摘要翻译：在具有一个或多个处理器的服务器和由一个或多个处理器执行的用于从视频和音频数据认证用户的存储器存储程序的服务器执行计算机实现的方法。该方法包括：从移动设备接收登录请求，登录请求包括视频数据和音频数据; 从视频数据中提取一组面部特征; 从音频数据提取一组音频特征并识别音频数据中的单词序列; 识别其各自的面部特征与该组面部特征相匹配的第一用户帐户和其各个音频特征与该组音频特征相匹配的第二用户帐户。如果第一个用户帐户与第二个用户帐户相同，则检索与用户帐户相关联的单词序列，并比较用于验证目的的单词序列。

4. 发明申请

US20160358610A1 METHOD AND DEVICE FOR VOICEPRINT RECOGNITION 审中-公开
标题翻译：用于VOICEPRINT识别的方法和装置
公开(公告)号：US20160358610A1
公开(公告)日：2016-12-08
申请号：US15240696
申请日：2016-08-18
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Eryu WANG , Li LU , Xiang ZHANG , Haibo LIU , Lou LI , Feng RAO , Duling LU , Shuai YUE , Bo CHEN
IPC分类号： G10L17/18 , G10L17/04 , G10L17/02 , G10L17/08
CPC分类号： G10L17/18 , G10L17/02 , G10L17/04 , G10L17/08
摘要： A method is performed at a device having one or more processors and memory. The device establishes a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data. The device establishes a second-level DNN model by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, wherein the second-level DNN model specifies a plurality of high-level voiceprint features. Using the second-level DNN model, registers a first high-level voiceprint feature sequence for a user based on a registration speech sample received from the user. The device performs speaker verification for the user based on the first high-level voiceprint feature sequence registered for the user.
摘要翻译：在具有一个或多个处理器和存储器的设备上执行一种方法。该设备基于未标记的语音数据建立第一级深神经网络（DNN）模型，不包含扬声器标签的未标记语音数据和指定用于未标记语音数据的多个基本声纹特征的第一级DNN模型。该设备通过基于标记的语音数据调整第一级DNN模型来建立第二级DNN模型，该标记语音数据包含具有相应扬声器标签的语音样本，其中第二级DNN模型指定多个高级声纹特征。使用第二级DNN模型，基于从用户接收的注册语音样本，为用户注册第一高级声纹特征序列。该设备基于为用户注册的第一个高级声纹特征序列，为用户执行扬声器验证。

5. 发明申请

US20150154955A1 Method and Apparatus For Performing Speech Keyword Retrieval 有权
标题翻译：执行语音关键词检索的方法和装置
公开(公告)号：US20150154955A1
公开(公告)日：2015-06-04
申请号：US14620000
申请日：2015-02-11
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Jianxiong MA , Lu LI , Li LU , Xiang ZHANG , Shuai YUE , Feng RAO , Eryu WANG , Linghui KONG
IPC分类号： G10L15/18
CPC分类号： G10L15/18 , G10L15/08 , G10L15/28 , G10L15/32 , G10L2015/088
摘要： A method and an apparatus are provided for retrieving keyword. The apparatus configures at least two types of language models in a model file, where each type of language model includes a recognition model and a corresponding decoding model; the apparatus extracts a speech feature from the to-be-processed speech data; performs language matching on the extracted speech feature by using recognition models in the model file one by one, and determines a recognition model based on a language matching rate; and determines a decoding model corresponding to the recognition model; decoding the extracted speech feature by using the determined decoding model, and obtains a word recognition result after the decoding; and matches a keyword in a keyword dictionary and the word recognition result, and outputs a matched keyword.
摘要翻译：提供了一种用于检索关键字的方法和装置。该装置在模型文件中配置至少两种类型的语言模型，其中每种类型的语言模型包括识别模型和相应的解码模型; 该设备从待处理语音数据中提取语音特征; 通过在模型文件中逐一使用识别模型对提取出的语音特征进行语言匹配，并根据语言匹配率确定识别模型; 并确定与识别模型相对应的解码模型; 通过使用所确定的解码模型来解码所提取的语音特征，并且在解码之后获得字识别结果; 并且将关键词字典中的关键字与单词识别结果进行匹配，并输出匹配的关键字。

6. 发明申请

US20140236600A1 METHOD AND DEVICE FOR KEYWORD DETECTION 有权
标题翻译：用于关键字检测的方法和装置
公开(公告)号：US20140236600A1
公开(公告)日：2014-08-21
申请号：US14103775
申请日：2013-12-11
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Li LU , Xiang ZHANG , Shuai YUE , Feng RAO , Eryu WANG , Lu LI
IPC分类号： G10L15/06
CPC分类号： G10L15/063 , G10L15/08 , G10L2015/088
摘要： An electronic device with one or more processors and memory trains an acoustic model with an international phonetic alphabet (IPA) phoneme mapping collection and audio samples in different languages, where the acoustic model includes: a foreground model; and a background model. The device generates a phone decoder based on the trained acoustic model. The device collects keyword audio samples, decodes the keyword audio samples with the phone decoder to generate phoneme sequence candidates, and selects a keyword phoneme sequence from the phoneme sequence candidates. After obtaining the keyword phoneme sequence, the device detects one or more keywords in an input audio signal with the trained acoustic model, including: matching phonemic keyword portions of the input audio signal with phonemes in the keyword phoneme sequence with the foreground model; and filtering out phonemic non-keyword portions of the input audio signal with the background model.
摘要翻译：具有一个或多个处理器和存储器的电子设备具有使用不同语言的国际语音字母（IPA）音素映射收集和音频样本的声学模型，其中声学模型包括：前景模型; 和背景模型。该设备基于经过训练的声学模型生成电话解码器。设备收集关键字音频样本，用手机解码器解码关键词音频样本，以产生音素序列候选，并从音素序列候选中选择关键词音素序列。在获得关键字音素序列之后，设备利用经训练的声学模型检测输入音频信号中的一个或多个关键词，包括：使用前景模型将关键字音素序列中的输入音频信号的音素关键词部分与音素相匹配; 并用背景模型滤出输入音频信号的音素非关键字部分。

7. 发明申请

US20140214401A1 METHOD AND DEVICE FOR ERROR CORRECTION MODEL TRAINING AND TEXT ERROR CORRECTION 审中-公开
标题翻译：用于错误校正模型训练和文本错误校正的方法和设备
公开(公告)号：US20140214401A1
公开(公告)日：2014-07-31
申请号：US14106642
申请日：2013-12-13
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Lou LI , Qiang CHENG , Feng RAO , Li LU , Xiang ZHANG , Shuai YUE , Bo CHEN
IPC分类号： G06F17/21
CPC分类号： G06F17/273
摘要： A computer-implemented method is performed at a device having one or more processors and memory storing programs executed by the one or more processors. The method comprises: selecting a target word in a target sentence; from the target sentence, acquiring a first sequence of words that precede the target word and a second sequence of words that succeed the target word; from a sentence database, searching and acquiring a group of words, each of which separates the first sequence of words from the second sequence of words in a sentence; creating a candidate sentence for each of the candidate words by replacing the target word in the target sentence with each of the candidate words; determining the fittest sentence among the candidate sentences according to a linguistic model; and suggesting the candidate word within the fittest sentence as a correction.
摘要翻译：在具有一个或多个处理器的设备和由一个或多个处理器执行的存储器存储程序的设备上执行计算机实现的方法。该方法包括：在目标句子中选择目标词; 从目标句子中获取在目标字之前的第一个单词序列和继续该目标词的第二个单词序列; 从句子数据库中搜索并获取一组单词，每一个单词都将第一个单词序列与一个句子中的第二个单词序列进行分隔; 通过用目标句子中的每个候选词替换目标词来为每个候选词创建候选句子; 根据语言模型确定候选句子中最适合的句子; 并建议适者生词中的候选词作为校正。

8. 发明申请

US20140236591A1 METHOD AND SYSTEM FOR AUTOMATIC SPEECH RECOGNITION 有权
标题翻译：自动语音识别方法与系统
公开(公告)号：US20140236591A1
公开(公告)日：2014-08-21
申请号：US14263958
申请日：2014-04-28
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Shuai YUE , Li Lu , Xiang Zhang , Dadong Xie , Bo Chen , Feng Rao
IPC分类号： G10L15/22
CPC分类号： G10L15/193 , G10L15/083
摘要： A method of recognizing speech is provided that includes generating a decoding network that includes a primary sub-network and a classification sub-network. The primary sub-network includes a classification node corresponding to the classification sub-network. The classification sub-network corresponds to a group of uncommon words. A speech input is received and decoded by instantiating a token in the primary sub-network and passing the token through the primary network. When the token reaches the classification node, the method includes transferring the token to the classification sub-network and passing the token through the classification sub-network. When the token reaches an accept node of the classification sub-network, the method includes returning a result of the token passing through the classification sub-network to the primary sub-network. The result includes one or more words in the group of uncommon words. A string corresponding to the speech input is output that includes the one or more words.
摘要翻译：提供一种识别语音的方法，其包括生成包括主子网络和分类子网络的解码网络。主子网包括与分类子网对应的分类节点。分类子网对应于一组不常见的单词。通过在主子网络中实例化令牌并传递令牌通过主网络来接收和解码语音输入。当令牌到达分类节点时，该方法包括将令牌传送到分类子网，并通过分类子网传递令牌。当令牌到达分类子网络的接受节点时，该方法包括将通过分类子网络的令牌的结果返回到主子网络。结果包括不常见词组中的一个或多个单词。输出对应于语音输入的字符串，其包括一个或多个单词。

9. 发明申请

US20140222417A1 METHOD AND DEVICE FOR ACOUSTIC LANGUAGE MODEL TRAINING 有权
标题翻译：用于语音语言模型训练的方法和装置
公开(公告)号：US20140222417A1
公开(公告)日：2014-08-07
申请号：US14109845
申请日：2013-12-17
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Duling LU , Lu LI , Feng RAO , Bo CHEN , Li LU , Xiang ZHANG , Eryu WANG , Shuai YUE
IPC分类号： G10L15/06 , G06F17/28
CPC分类号： G10L15/063 , G06F17/28 , G10L15/183
摘要： A method and a device for training an acoustic language model, include: conducting word segmentation for training samples in a training corpus using an initial language model containing no word class labels, to obtain initial word segmentation data containing no word class labels; performing word class replacement for the initial word segmentation data containing no word class labels, to obtain first word segmentation data containing word class labels; using the first word segmentation data containing word class labels to train a first language model containing word class labels; using the first language model containing word class labels to conduct word segmentation for the training samples in the training corpus, to obtain second word segmentation data containing word class labels; and in accordance with the second word segmentation data meeting one or more predetermined criteria, using the second word segmentation data containing word class labels to train the acoustic language model.
摘要翻译：一种用于训练声学语言模型的方法和装置，包括：使用不含词类标签的初始语言模型，在训练语料库中训练样本的词分割，以获得不包含词类标签的初始分词数据; 对不包含词类标签的初始分词数据执行单词类替换，以获得包含单词分类标签的第一分词数据; 使用包含词类标签的第一词分割数据来训练包含词类标签的第一语言模型; 使用包含词类标签的第一语言模型对训练语料库中的训练样本进行词分割，以获得包含词类标签的第二词分割数据; 并且根据满足一个或多个预定标准的第二字分割数据，使用包含词类标签的第二词分割数据来训练声学语言模型。

10. 发明申请

US20140214417A1 METHOD AND DEVICE FOR VOICEPRINT RECOGNITION 有权
标题翻译：用于VOICEPRINT识别的方法和装置
公开(公告)号：US20140214417A1
公开(公告)日：2014-07-31
申请号：US14105110
申请日：2013-12-12
申请人： Tencent Technology (Shenzhen) Company Limited
发明人： Eryu WANG , Li LU , Xiang ZHANG , Haibo LIU , Lou LI , Feng RAO , Duling LU , Shuai YUE , Bo CHEN
IPC分类号： G10L17/00
CPC分类号： G10L17/18 , G10L17/02 , G10L17/04 , G10L17/08
摘要： A method and device for voiceprint recognition, include: establishing a first-level Deep Neural Network (DNN) model based on unlabeled speech data, the unlabeled speech data containing no speaker labels and the first-level DNN model specifying a plurality of basic voiceprint features for the unlabeled speech data; obtaining a plurality of high-level voiceprint features by tuning the first-level DNN model based on labeled speech data, the labeled speech data containing speech samples with respective speaker labels, and the tuning producing a second-level DNN model specifying the plurality of high-level voiceprint features; based on the second-level DNN model, registering a respective high-level voiceprint feature sequence for a user based on a registration speech sample received from the user; and performing speaker verification for the user based on the respective high-level voiceprint feature sequence registered for the user.
摘要翻译：用于声纹识别的方法和装置包括：基于未标记的语音数据建立第一级深神经网络（DNN）模型，不包含扬声器标签的未标记语音数据和指定多个基本声纹特征的第一级DNN模型对于未标记的语音数据; 通过基于标记的语音数据调整第一级DNN模型来获得多个高级声纹特征，所述标记语音数据包含具有相应扬声器标签的语音样本，并且调谐产生指定多个高的DNN模型级的声纹特征; 基于第二级DNN模型，基于从用户接收到的注册语音样本，为用户注册相应的高级声纹特征序列; 以及基于为用户注册的各个高级声纹特征序列，为用户执行说话人验证。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式