专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US06754630B2 Synthesis of speech from pitch prototype waveforms by time-synchronous waveform interpolation 失效
标题翻译：通过时间 - 同步波形插值从音调原型波形合成语音
公开(公告)号：US06754630B2
公开(公告)日：2004-06-22
申请号：US09191631
申请日：1998-11-13
申请人： Amitava Das , Eddie L. T. Choy
发明人： Amitava Das , Eddie L. T. Choy
IPC分类号： G10L1304
CPC分类号： G10L19/0204 , G10L25/27
摘要： In a method of synthesizing voiced speech from pitch prototype waveforms by time-synchronous waveform interpolation (TSWI), one or more pitch prototypes is extracted from a speech signal or a residue signal. The extraction process is performed in such a way that the prototype has minimum energy at the boundary. Each prototype is circularly shifted so as to be time-synchronous with the original signal. A linear phase shift is applied to each extracted prototype relative to the previously extracted prototype so as to maximize the cross-correlation between successive extracted prototypes. A two-dimensional prototype-evolving surface is constructed by unsampling the prototypes to every sample point. The two-dimensional prototype-evolving surface is re-sampled to generate a one-dimensional, synthesized signal frame with sample points defined by piecewise continuous cubic phase contour functions computed from the pitch lags and the phase shifts added to the extracted prototypes. A pre-selection filter may be applied to determine whether to abandon the TSWI technique in favor of another algorithm for the current frame. A post-selection performance measure may be obtained and compared with a predetermined threshold to determine whether the TSWI algorithm is performing adequately.
摘要翻译：在通过时间 - 同步波形插值（TSWI）从音调原型波形合成有声语音的方法中，从语音信号或残留信号中提取一个或多个音调原型。提取过程以使原型在边界处具有最小能量的方式进行。每个原型都是循环移位的，以便与原始信号保持时间同步。相对于先前提取的原型，对每个提取的原型应用线性相移，以便最大化连续提取的原型之间的互相关。通过对每个采样点的原型进行不抽样来构建二维原型演化曲面。二维原型演化曲面被重新采样以产生一维合成信号帧，其中采样点由从间距延迟计算的分段连续立方相轮廓函数和加到提取的原型上的相移定义。可以应用预选滤波器来确定是否放弃TSWI技术以有利于当前帧的另一算法。可以获得选择后性能测量并与预定阈值进行比较，以确定TSWI算法是否正在充分执行。

2. 发明授权

US06640209B1 Closed-loop multimode mixed-domain linear prediction (MDLP) speech coder 有权
标题翻译：闭环多模混合域线性预测（MDLP）语音编码器
公开(公告)号：US06640209B1
公开(公告)日：2003-10-28
申请号：US09259151
申请日：1999-02-26
申请人： Amitava Das
发明人： Amitava Das
IPC分类号： G10L1904
CPC分类号： G10L19/18
摘要： A closed-loop, multimode, mixed-domain linear prediction (MDLP) speech coder includes a high-rate, time-domain coding mode, a low-rate, frequency-domain coding mode, and a closed-loop mode-selection mechanism for selecting a coding mode for the coder based upon the speech content of frames input to the coder. Transition speech (i.e., from unvoiced speech to voiced speech, or vice versa) frames are encoded with the high-rate, time-domain coding mode, which may be a CELP coding mode. Voiced speech frames are encoded with the low-rate, frequency-domain coding mode, which may be a harmonic coding mode. Phase parameters are not encoded by the frequency-domain coding mode, and are instead modeled in accordance with, e.g., a quadratic phase model. For each speech frame encoded with the frequency-domain coding mode, the initial phase value is taken to be the initial phase value of the immediately preceding speech frame encoded with the frequency-domain coding mode. If the immediately preceding speech frame was encoded with the time-domain coding mode, the initial phase value of the current speech frame is computed from the decoded speech frame information of the immediately preceding, time-domain-encoded speech frame. Each speech frame encoded with the frequency-domain coding mode may be compared with the corresponding input speech frame to obtain a performance measure. If the performance measure falls below a predefined threshold value, the input speech frame is encoded with the time-domain coding mode.
摘要翻译：闭环多模混合域线性预测（MDLP）语音编码器包括高速率时域编码模式，低速率频域编码模式和闭环模式选择机制，用于基于输入到编码器的帧的语音内容，为编码器选择编码模式。以可能是CELP编码模式的高速率时域编码模式来编码转换语音（即，从无声语音到浊音，或反之亦然）帧。语音帧以低速，频域编码模式进行编码，频率编码模式可以是谐波编码模式。相位参数不是由频域编码模式编码的，而是根据例如二次相位模型进行建模。对于以频域编码模式编码的每个语音帧，初始相位值被认为是用频域编码模式编码的紧接在前的语音帧的初始相位值。如果紧接在前的语音帧用时域编码模式进行编码，则从紧接在前的时域编码语音帧的解码语音帧信息计算当前语音帧的初始相位值。可以将用频域编码模式编码的每个语音帧与相应的输入语音帧进行比较以获得性能测量。如果性能测量值低于预定义的阈值，则用时域编码模式对输入语音帧进行编码。

3. 发明申请

US20120016673A1 SPEAKER RECOGNITION VIA VOICE SAMPLE BASED ON MULTIPLE NEAREST NEIGHBOR CLASSIFIERS 有权
标题翻译：基于多个邻近分类器的声音识别语音识别
公开(公告)号：US20120016673A1
公开(公告)日：2012-01-19
申请号：US13246681
申请日：2011-09-27
申请人： Amitava Das
发明人： Amitava Das
IPC分类号： G10L17/00
CPC分类号： G10L17/10
摘要： A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.
摘要翻译：扬声器识别系统产生码本商店，其代码书代表讲话者的语音样本，称为培训者。扬声器识别系统可以使用多个分类器并为每个分类器生成码本存储。每个分类器使用语音样本的不同特征集作为其特征。分类器输入人的声音样本，并尝试认证或识别人。分类器产生输入语音样本的特征向量序列，然后生成该序列的码矢量。分类器使用其码本存储来识别该人。扬声器识别系统然后将分类器的分数组合以产生总分。如果分数满足识别标准，则说话者识别系统指示语音样本来自该说话者。

4. 发明授权

US08050919B2 Speaker recognition via voice sample based on multiple nearest neighbor classifiers 有权
标题翻译：基于多个最近邻分类器的语音样本的扬声器识别
公开(公告)号：US08050919B2
公开(公告)日：2011-11-01
申请号：US11771794
申请日：2007-06-29
申请人： Amitava Das
发明人： Amitava Das
IPC分类号： G10L17/00
CPC分类号： G10L17/10
摘要： A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.
摘要翻译：扬声器识别系统产生码本商店，其代码书代表讲话者的语音样本，称为培训者。扬声器识别系统可以使用多个分类器并为每个分类器生成码本存储。每个分类器使用语音样本的不同特征集作为其特征。分类器输入人的声音样本，并尝试认证或识别人。分类器产生输入语音样本的特征向量序列，然后生成该序列的码矢量。分类器使用其码本存储来识别该人。扬声器识别系统然后将分类器的分数组合以产生总分。如果分数满足识别标准，则说话者识别系统指示语音样本来自该说话者。

5. 发明申请

US20090006093A1 SPEAKER RECOGNITION VIA VOICE SAMPLE BASED ON MULTIPLE NEAREST NEIGHBOR CLASSIFIERS 有权
标题翻译：基于多个邻近分类器的声音识别语音识别
公开(公告)号：US20090006093A1
公开(公告)日：2009-01-01
申请号：US11771794
申请日：2007-06-29
申请人： Amitava Das
发明人： Amitava Das
IPC分类号： G10L15/00 , G10L17/00
CPC分类号： G10L17/10
摘要： A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.
摘要翻译：扬声器识别系统产生码本商店，其代码书代表讲话者的语音样本，称为培训者。扬声器识别系统可以使用多个分类器并为每个分类器生成码本存储。每个分类器使用语音样本的不同特征集作为其特征。分类器输入人的声音样本，并尝试认证或识别人。分类器产生输入语音样本的特征向量序列，然后生成该序列的码矢量。分类器使用其码本存储来识别该人。扬声器识别系统然后将分类器的分数组合以产生总分。如果分数满足识别标准，则说话者识别系统指示语音样本来自该说话者。

6. 发明授权

US07310307B1 System and method for authenticating an element in a network environment 有权
标题翻译：用于在网络环境中认证元素的系统和方法
公开(公告)号：US07310307B1
公开(公告)日：2007-12-18
申请号：US10322128
申请日：2002-12-17
申请人： Amitava Das , Michael A. Wright , Joseph A. Salowey , William C. Gossman
发明人： Amitava Das , Michael A. Wright , Joseph A. Salowey , William C. Gossman
IPC分类号： H04J3/12 , H04Q7/24 , H04L12/66
CPC分类号： H04L12/66 , H04L63/0892 , H04W12/06 , H04W84/12 , H04W88/14 , H04W88/16
摘要： A method for authenticating an element in a network environment is provided that includes receiving a request for one or more triplets. One or more of the triplets may be associated with an authentication communications protocol that may be executed in order to facilitate a communication session. The method further includes returning one or more of the triplets in response to the request and initiating the communication session in response to the triplets after proper authentication of an entity associated with the request.
摘要翻译：提供了一种用于认证网络环境中的元素的方法，其包括接收对一个或多个三元组的请求。一个或多个三元组可以与可以被执行以便促进通信会话的认证通信协议相关联。该方法还包括响应于该请求返回一个或多个三元组，并且在与该请求相关联的实体进行适当认证之后响应于三元组发起通信会话。

7. 发明申请

US20050043944A1 Low bit-rate coding of unvoiced segments of speech 有权
标题翻译：无声段语音的低比特率编码
公开(公告)号：US20050043944A1
公开(公告)日：2005-02-24
申请号：US10954851
申请日：2004-09-29
申请人： Amitava Das , Sharath Manjunath
发明人： Amitava Das , Sharath Manjunath
IPC分类号： G10L19/00 , G10L19/04 , G10L19/08 , G10L19/14 , H03M7/30 , G10L11/06
CPC分类号： G10L19/18 , G10L19/08 , G10L25/21
摘要： A low-bit-rate coding technique for unvoiced segments of speech includes the steps of extracting high-time-resolution energy coefficients from a frame of speech, quantizing the energy coefficients, generating a high-time-resolution energy envelope from the quantized energy coefficients, and reconstituting a residue signal by shaping a randomly generated noise vector with quantized values of the energy envelope. The energy envelope may be generated with a linear interpolation technique. A post-processing measure may be obtained and compared with a predefined threshold to determine whether the coding algorithm is performing adequately.
摘要翻译：用于无声话音段的低比特率编码技术包括以下步骤：从语音帧中提取高分辨率能量系数，量化能量系数，从量化能量系数产生高分辨率能量包络并且通过用能量包络的量化值对随机生成的噪声向量进行整形来重构残差信号。可以用线性插值技术产生能量包络。可以获得后处理措施并与预定义的阈值进行比较，以确定编码算法是否正在充分执行。

8. 发明授权

US06449592B1 Method and apparatus for tracking the phase of a quasi-periodic signal 有权
标题翻译：用于跟踪准周期信号相位的方法和装置
公开(公告)号：US06449592B1
公开(公告)日：2002-09-10
申请号：US09259247
申请日：1999-02-26
申请人： Amitava Das
发明人： Amitava Das
IPC分类号： G10L1914
CPC分类号： G10L19/02
摘要： A method for tracking the phase of a quasi-periodic signal includes the steps of estimating the phase of the signal for frames during which the signal is periodic, monitoring the performance of the estimated phase with a closed-loop performance measure, and measuring the phase of the signal for frames during which the signal is periodic and performance of the estimated phase falls below a predefined threshold level. In estimating the phase, the initial phase value is set equal to the estimated final phase value of the previous frame if the previous frame was periodic. The initial phase value is set equal to a measured phase value of the previous frame if the previous frame was nonperiodic, or if the previous frame was periodic and performance of the estimated phase for the previous frame fell below the predefined threshold level. For frames during which the signal is nonperiodic, the phase of the signal is measured. An open-loop periodicity decision can be used to determine whether the signal is periodic for a given frame.
摘要翻译：用于跟踪准周期信号的相位的方法包括以下步骤：估计信号周期的帧的信号的相位，通过闭环性能测量来监测估计相位的性能，以及测量相位信号是信号周期性的信号，并且估计相位的性能下降到预定阈值以下。在估计相位时，如果先前帧是周期性的，初始相位值被设置为等于前一帧的估计最终相位值。如果前一帧是非周期性的，或者如果前一帧是周期性的并且前一帧的估计相位的性能下降到预定阈值水平以下，则将初始相位值设置为等于前一帧的测量相位值。对于信号为非周期性的帧，测量信号的相位。可以使用开环周期性决定来确定信号对于给定帧是否是周期性的。

9. 发明授权

US06260017B1 Multipulse interpolative coding of transition speech frames 有权
标题翻译：过渡语音帧的多脉冲内插编码
公开(公告)号：US06260017B1
公开(公告)日：2001-07-10
申请号：US09307294
申请日：1999-05-07
申请人： Amitava Das , Sharath Manjunath
发明人： Amitava Das , Sharath Manjunath
IPC分类号： G10L1910
CPC分类号： G10L19/18 , G10L19/10
摘要： A multipulse interpolative coder for transition speech frames includes an extractor configured to represent a first frame of transitional speech samples by a subset of the samples of the frame. The coder also includes an interpolator configured to interpolate the subset of samples and a subset of samples extracted from an earlier-received frame to synthesize other samples of the first frame that are not included in the subset. The subset of samples is further simplified by selecting a set of pulses from the subset and assigning zero values to unselected pulses. In the alternative, a portion of the unselected pulses may be quantized. The set of pulses may be the pulses having the greatest absolute amplitudes in the subset. In the alternative, the set of pulses may be the most perceptually significant pulses of the subset.
摘要翻译：用于转换语音帧的多脉冲内插编码器包括提取器，其被配置为通过帧的样本的子集来表示过渡语音样本的第一帧。编码器还包括被配置为内插样本子集的内插器和从较早接收的帧提取的样本的子集，以合成不包括在子集中的第一帧的其他样本。通过从子集中选择一组脉冲并将零值分配给未选择的脉冲来进一步简化样本子集。在替代方案中，可以对未选择的脉冲的一部分进行量化。该组脉冲可以是子集中绝对幅度最大的脉冲。在替代方案中，该组脉冲可以是该子集中最感知的显着脉冲。

10. 发明授权

US07991199B2 Object identification and verification using transform vector quantization 有权
标题翻译：使用变换矢量量化的对象识别和验证
公开(公告)号：US07991199B2
公开(公告)日：2011-08-02
申请号：US11771879
申请日：2007-06-29
申请人： Amitava Das
发明人： Amitava Das
IPC分类号： G06K9/00
CPC分类号： G06K9/6223 , G06K9/6272
摘要： An identification system uses mappings of known objects to codebooks representing those objects to identify an object represented by multiple input representations or to verify that an input representation corresponds to an input known object. To identify the object, the identification system generates an input feature vector for each input representation. The identification system then accumulates for each known object the distances between the codebook of that object and each of the input feature vectors. The distance between a codebook and a feature vector may be the minimum of the distances between the code vectors of the codebook and the feature vector. The identification system then selects the object with the smallest accumulated distance as being the object represented by the multiple input representations.
摘要翻译：识别系统使用已知对象的映射来代表这些对象的码本来识别由多个输入表示表示的对象或者验证输入表示对应于输入的已知对象。为了识别对象，识别系统为每个输入表示生成输入特征向量。识别系统然后为每个已知对象累积该对象的码本与每个输入特征向量之间的距离。码本和特征向量之间的距离可以是码本的代码矢量与特征向量之间的距离的最小值。识别系统然后选择具有最小累积距离的对象作为由多个输入表示表示的对象。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式