会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Synthesis of speech from pitch prototype waveforms by time-synchronous waveform interpolation
    • 通过时间 - 同步波形插值从音调原型波形合成语音
    • US06754630B2
    • 2004-06-22
    • US09191631
    • 1998-11-13
    • Amitava DasEddie L. T. Choy
    • Amitava DasEddie L. T. Choy
    • G10L1304
    • G10L19/0204G10L25/27
    • In a method of synthesizing voiced speech from pitch prototype waveforms by time-synchronous waveform interpolation (TSWI), one or more pitch prototypes is extracted from a speech signal or a residue signal. The extraction process is performed in such a way that the prototype has minimum energy at the boundary. Each prototype is circularly shifted so as to be time-synchronous with the original signal. A linear phase shift is applied to each extracted prototype relative to the previously extracted prototype so as to maximize the cross-correlation between successive extracted prototypes. A two-dimensional prototype-evolving surface is constructed by unsampling the prototypes to every sample point. The two-dimensional prototype-evolving surface is re-sampled to generate a one-dimensional, synthesized signal frame with sample points defined by piecewise continuous cubic phase contour functions computed from the pitch lags and the phase shifts added to the extracted prototypes. A pre-selection filter may be applied to determine whether to abandon the TSWI technique in favor of another algorithm for the current frame. A post-selection performance measure may be obtained and compared with a predetermined threshold to determine whether the TSWI algorithm is performing adequately.
    • 在通过时间 - 同步波形插值(TSWI)从音调原型波形合成有声语音的方法中,从语音信号或残留信号中提取一个或多个音调原型。 提取过程以使原型在边界处具有最小能量的方式进行。 每个原型都是循环移位的,以便与原始信号保持时间同步。 相对于先前提取的原型,对每个提取的原型应用线性相移,以便最大化连续提取的原型之间的互相关。 通过对每个采样点的原型进行不抽样来构建二维原型演化曲面。 二维原型演化曲面被重新采样以产生一维合成信号帧,其中采样点由从间距延迟计算的分段连续立方相轮廓函数和加到提取的原型上的相移定义。 可以应用预选滤波器来确定是否放弃TSWI技术以有利于当前帧的另一算法。 可以获得选择后性能测量并与预定阈值进行比较,以确定TSWI算法是否正在充分执行。
    • 2. 发明授权
    • Closed-loop multimode mixed-domain linear prediction (MDLP) speech coder
    • 闭环多模混合域线性预测(MDLP)语音编码器
    • US06640209B1
    • 2003-10-28
    • US09259151
    • 1999-02-26
    • Amitava Das
    • Amitava Das
    • G10L1904
    • G10L19/18
    • A closed-loop, multimode, mixed-domain linear prediction (MDLP) speech coder includes a high-rate, time-domain coding mode, a low-rate, frequency-domain coding mode, and a closed-loop mode-selection mechanism for selecting a coding mode for the coder based upon the speech content of frames input to the coder. Transition speech (i.e., from unvoiced speech to voiced speech, or vice versa) frames are encoded with the high-rate, time-domain coding mode, which may be a CELP coding mode. Voiced speech frames are encoded with the low-rate, frequency-domain coding mode, which may be a harmonic coding mode. Phase parameters are not encoded by the frequency-domain coding mode, and are instead modeled in accordance with, e.g., a quadratic phase model. For each speech frame encoded with the frequency-domain coding mode, the initial phase value is taken to be the initial phase value of the immediately preceding speech frame encoded with the frequency-domain coding mode. If the immediately preceding speech frame was encoded with the time-domain coding mode, the initial phase value of the current speech frame is computed from the decoded speech frame information of the immediately preceding, time-domain-encoded speech frame. Each speech frame encoded with the frequency-domain coding mode may be compared with the corresponding input speech frame to obtain a performance measure. If the performance measure falls below a predefined threshold value, the input speech frame is encoded with the time-domain coding mode.
    • 闭环多模混合域线性预测(MDLP)语音编码器包括高速率时域编码模式,低速率频域编码模式和闭环模式选择机制,用于 基于输入到编码器的帧的语音内容,为编码器选择编码模式。 以可能是CELP编码模式的高速率时域编码模式来编码转换语音(即,从无声语音到浊音,或反之亦然)帧。 语音帧以低速,频域编码模式进行编码,频率编码模式可以是谐波编码模式。 相位参数不是由频域编码模式编码的,而是根据例如二次相位模型进行建模。 对于以频域编码模式编码的每个语音帧,初始相位值被认为是用频域编码模式编码的紧接在前的语音帧的初始相位值。 如果紧接在前的语音帧用时域编码模式进行编码,则从紧接在前的时域编码语音帧的解码语音帧信息计算当前语音帧的初始相位值。 可以将用频域编码模式编码的每个语音帧与相应的输入语音帧进行比较以获得性能测量。 如果性能测量值低于预定义的阈值,则用时域编码模式对输入语音帧进行编码。
    • 3. 发明申请
    • SPEAKER RECOGNITION VIA VOICE SAMPLE BASED ON MULTIPLE NEAREST NEIGHBOR CLASSIFIERS
    • 基于多个邻近分类器的声音识别语音识别
    • US20120016673A1
    • 2012-01-19
    • US13246681
    • 2011-09-27
    • Amitava Das
    • Amitava Das
    • G10L17/00
    • G10L17/10
    • A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.
    • 扬声器识别系统产生码本商店,其代码书代表讲话者的语音样本,称为培训者。 扬声器识别系统可以使用多个分类器并为每个分类器生成码本存储。 每个分类器使用语音样本的不同特征集作为其特征。 分类器输入人的声音样本,并尝试认证或识别人。 分类器产生输入语音样本的特征向量序列,然后生成该序列的码矢量。 分类器使用其码本存储来识别该人。 扬声器识别系统然后将分类器的分数组合以产生总分。 如果分数满足识别标准,则说话者识别系统指示语音样本来自该说话者。
    • 4. 发明授权
    • Speaker recognition via voice sample based on multiple nearest neighbor classifiers
    • 基于多个最近邻分类器的语音样本的扬声器识别
    • US08050919B2
    • 2011-11-01
    • US11771794
    • 2007-06-29
    • Amitava Das
    • Amitava Das
    • G10L17/00
    • G10L17/10
    • A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.
    • 扬声器识别系统产生码本商店,其代码书代表讲话者的语音样本,称为培训者。 扬声器识别系统可以使用多个分类器并为每个分类器生成码本存储。 每个分类器使用语音样本的不同特征集作为其特征。 分类器输入人的声音样本,并尝试认证或识别人。 分类器产生输入语音样本的特征向量序列,然后生成该序列的码矢量。 分类器使用其码本存储来识别该人。 扬声器识别系统然后将分类器的分数组合以产生总分。 如果分数满足识别标准,则说话者识别系统指示语音样本来自该说话者。
    • 5. 发明申请
    • SPEAKER RECOGNITION VIA VOICE SAMPLE BASED ON MULTIPLE NEAREST NEIGHBOR CLASSIFIERS
    • 基于多个邻近分类器的声音识别语音识别
    • US20090006093A1
    • 2009-01-01
    • US11771794
    • 2007-06-29
    • Amitava Das
    • Amitava Das
    • G10L15/00G10L17/00
    • G10L17/10
    • A speaker recognition system generates a codebook store with codebooks representing voice samples of speaker, referred to as trainers. The speaker recognition system may use multiple classifiers and generate a codebook store for each classifier. Each classifier uses a different set of features of a voice sample as its features. A classifier inputs a voice sample of an person and tries to authenticate or identify the person. A classifier generates a sequence of feature vectors for the input voice sample and then a code vector for that sequence. The classifier uses its codebook store to recognize the person. The speaker recognition system then combines the scores of the classifiers to generate an overall score. If the score satisfies a recognition criterion, then the speaker recognition system indicates that the voice sample is from that speaker.
    • 扬声器识别系统产生码本商店,其代码书代表讲话者的语音样本,称为培训者。 扬声器识别系统可以使用多个分类器并为每个分类器生成码本存储。 每个分类器使用语音样本的不同特征集作为其特征。 分类器输入人的声音样本,并尝试认证或识别人。 分类器产生输入语音样本的特征向量序列,然后生成该序列的码矢量。 分类器使用其码本存储来识别该人。 扬声器识别系统然后将分类器的分数组合以产生总分。 如果分数满足识别标准,则说话者识别系统指示语音样本来自该说话者。
    • 8. 发明授权
    • Method and apparatus for tracking the phase of a quasi-periodic signal
    • 用于跟踪准周期信号相位的方法和装置
    • US06449592B1
    • 2002-09-10
    • US09259247
    • 1999-02-26
    • Amitava Das
    • Amitava Das
    • G10L1914
    • G10L19/02
    • A method for tracking the phase of a quasi-periodic signal includes the steps of estimating the phase of the signal for frames during which the signal is periodic, monitoring the performance of the estimated phase with a closed-loop performance measure, and measuring the phase of the signal for frames during which the signal is periodic and performance of the estimated phase falls below a predefined threshold level. In estimating the phase, the initial phase value is set equal to the estimated final phase value of the previous frame if the previous frame was periodic. The initial phase value is set equal to a measured phase value of the previous frame if the previous frame was nonperiodic, or if the previous frame was periodic and performance of the estimated phase for the previous frame fell below the predefined threshold level. For frames during which the signal is nonperiodic, the phase of the signal is measured. An open-loop periodicity decision can be used to determine whether the signal is periodic for a given frame.
    • 用于跟踪准周期信号的相位的方法包括以下步骤:估计信号周期的帧的信号的相位,通过闭环性能测量来监测估计相位的性能,以及测量相位 信号是信号周期性的信号,并且估计相位的性能下降到预定阈值以下。 在估计相位时,如果先前帧是周期性的,初始相位值被设置为等于前一帧的估计最终相位值。 如果前一帧是非周期性的,或者如果前一帧是周期性的并且前一帧的估计相位的性能下降到预定阈值水平以下,则将初始相位值设置为等于前一帧的测量相位值。 对于信号为非周期性的帧,测量信号的相位。 可以使用开环周期性决定来确定信号对于给定帧是否是周期性的。
    • 9. 发明授权
    • Multipulse interpolative coding of transition speech frames
    • 过渡语音帧的多脉冲内插编码
    • US06260017B1
    • 2001-07-10
    • US09307294
    • 1999-05-07
    • Amitava DasSharath Manjunath
    • Amitava DasSharath Manjunath
    • G10L1910
    • G10L19/18G10L19/10
    • A multipulse interpolative coder for transition speech frames includes an extractor configured to represent a first frame of transitional speech samples by a subset of the samples of the frame. The coder also includes an interpolator configured to interpolate the subset of samples and a subset of samples extracted from an earlier-received frame to synthesize other samples of the first frame that are not included in the subset. The subset of samples is further simplified by selecting a set of pulses from the subset and assigning zero values to unselected pulses. In the alternative, a portion of the unselected pulses may be quantized. The set of pulses may be the pulses having the greatest absolute amplitudes in the subset. In the alternative, the set of pulses may be the most perceptually significant pulses of the subset.
    • 用于转换语音帧的多脉冲内插编码器包括提取器,其被配置为通过帧的样本的子集来表示过渡语音样本的第一帧。 编码器还包括被配置为内插样本子集的内插器和从较早接收的帧提取的样本的子集,以合成不包括在子集中的第一帧的其他样本。 通过从子集中选择一组脉冲并将零值分配给未选择的脉冲来进一步简化样本子集。 在替代方案中,可以对未选择的脉冲的一部分进行量化。 该组脉冲可以是子集中绝对幅度最大的脉冲。 在替代方案中,该组脉冲可以是该子集中最感知的显着脉冲。
    • 10. 发明授权
    • Object identification and verification using transform vector quantization
    • 使用变换矢量量化的对象识别和验证
    • US07991199B2
    • 2011-08-02
    • US11771879
    • 2007-06-29
    • Amitava Das
    • Amitava Das
    • G06K9/00
    • G06K9/6223G06K9/6272
    • An identification system uses mappings of known objects to codebooks representing those objects to identify an object represented by multiple input representations or to verify that an input representation corresponds to an input known object. To identify the object, the identification system generates an input feature vector for each input representation. The identification system then accumulates for each known object the distances between the codebook of that object and each of the input feature vectors. The distance between a codebook and a feature vector may be the minimum of the distances between the code vectors of the codebook and the feature vector. The identification system then selects the object with the smallest accumulated distance as being the object represented by the multiple input representations.
    • 识别系统使用已知对象的映射来代表这些对象的码本来识别由多个输入表示表示的对象或者验证输入表示对应于输入的已知对象。 为了识别对象,识别系统为每个输入表示生成输入特征向量。 识别系统然后为每个已知对象累积该对象的码本与每个输入特征向量之间的距离。 码本和特征向量之间的距离可以是码本的代码矢量与特征向量之间的距离的最小值。 识别系统然后选择具有最小累积距离的对象作为由多个输入表示表示的对象。