会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 73. 发明公开
    • Systems, methods and articles of manufacture for performing high resolution N-best string hypothesization
    • 手段,方法和执行多个假设的高分辨率处理物品
    • EP0720147A1
    • 1996-07-03
    • EP95309005.7
    • 1995-12-12
    • AT&T Corp.
    • Chou, WuLee, Chin-HuiJuang, Biing-HwangMatsuoka, Tatsuo
    • G10L5/06G10L5/00G10L9/00G10L7/08G10L9/06G10L9/18
    • G10L15/08G10L15/197G10L2015/025
    • Disclosed are systems, methods and articles of manufacture for performing high resolution N-best string hypothesization during speech recognition. A received input signal, representing a speech utterance, is processed utilizing a plurality of recognition models to generate one or more string hypotheses of the received input signal. The plurality of recognition models preferably include one or more inter-word context dependent models and one or more language models. A forward partial path map is produced according to the allophonic specifications of at least one of the inter-word context dependent models and the language models. The forward partial path map is traversed in the backward direction as a function of the allophonic specifications to generate the one or more string hypotheses. One or more of the recognition models may represent one phone words.
    • 本发明公开了系统,方法和语音识别过程中进行高分辨率的N条最佳串hypothesization制品。 A接收输入信号,代表语音发声,被处理利用识别模型的一个多元化,以生成所接收的输入信号的一个或多个字符串的假说。 识别模型的多元性优选包括一个或多个单词间的上下文相关模型和一个或多个语言模型。 前向部分路径映射雅丁到字间上下文相关模型的至少一者的音位规格和语言模型产生的。 向前部分路径地图被在向后方向作为音位规格的一个函数,以产生所述一个或多个字符串的假设穿过。 一个或多个识别模型可以代表一个手机的话。
    • 74. 发明公开
    • Reduction of search space in speech recognition using phone boundaries and phone ranking
    • Verminderung des Suchraumes bei Spracherkennung unter Verwendung von Tongrenzen und Tonklassen
    • EP0715298A1
    • 1996-06-05
    • EP95109575.1
    • 1995-06-21
    • International Business Machines Corporation
    • Nahamoo, DavidPadmanabhan, Mukund
    • G10L5/06G10L7/08G10L9/06G10L9/18
    • G10L15/04G10L15/142G10L2015/025G10L2015/085
    • A method for estimating the probability of phone boundaries as well as the accuracy of the acoustic modelling in cutting down a search-space in a speech recognition system. The accuracy of the acoustic modelling is quantified by the rank of the correct phone. The invention includes a microphone for converting an utterance into an electrical signal. The signal from the microphone is processed by an acoustic processor and label match which finds the best-matched acoustic label prototype from the acoustic label prototype store. A probability distribution on phone boundaries is then produced for every time frame using the first decision tree described in the invention. These probabilities are compared to a threshold and some time frames are identified as boundaries between phones. An acoustic score is computed, for all phones between every given pair of hypothesized boundaries, and the phones are ranked on the basis of this score. The second decision tree is traversed for every time frame to obtain the worst case rank of the correct phone at that time, and using the phone score and phone rank computed in, a shortlist of allowed phones is made up for every time frame. This information is used to select a subset of acoustic word models in store, and a fast acoustic word match processor matches the label string from the acoustic processor against this subset of abridged acoustic word models to produce an utterance signal. The utterance signal output by the fast acoustic word match processor comprises of at least one word. In general, however, the fast acoustic word match processor will output a number of candidate words. Each word signal produced by the fast acoustic word match processor is input into a word context match which compares the word context to language models in store and outputs at least one candidate word. From the recognition candidates produced by the fast acoustic match and the language model, the detailed acoustic match matches the label string from the acoustic processor against detailed acoustic word models in store and outputs a word string corresponding to an utterance.
    • 一种用于估计电话边界的概率以及在模拟语音识别系统中的搜索空间中的声学建模的准确性的方法。 声学建模的准确度由正确的手机的等级来量化。 本发明包括用于将发音转换为电信号的麦克风。 来自麦克风的信号由声学处理器和标签匹配处理,其从声学标签原型商店找到最佳匹配的声学标签原型。 然后,使用本发明中描述的第一决策树,为每个时间帧产生电话边界上的概率分布。 将这些概率与阈值进行比较,并将一些时间帧识别为手机之间的边界。 计算每个给定的一对假设边界之间的所有电话的声学得分,并且手机基于该分数进行排名。 第二个决策树遍历每个时间帧,以获得当时正确的电话的最差情况等级,并使用计算的电话得分和电话等级,每个时间段都会配置一个允许电话的列表。 该信息用于选择存储中的声学词语模型的子集,并且快速声学词匹配处理器将来自声学处理器的标签串与缩写声学语言模型的该子集相匹配以产生话音信号。 由快速声字匹配处理器输出的发声信号至少包括一个字。 然而,通常,快速声音匹配处理器将输出多个候选词。 将由快速声学词匹配处理器产生的每个字信号输入到词上下文匹配中,该词上下文匹配将词上下文与存储中的语言模型进行比较,并输出至少一个候选词。 从快速声学匹配和语言模型产生的识别候选中,详细的声匹配将来自声学处理器的标签串与存储中的详细声学词模型相匹配,并输出与发音相对应的字串。
    • 76. 发明公开
    • Speech recognition using dynamic features
    • Spracherkennung unter Verwendung动力学家Karakteristiken
    • EP0689193A1
    • 1995-12-27
    • EP95102320.9
    • 1995-02-20
    • International Business Machines Corporation
    • Bahl, Lahit Raide Souza, Peter VincentGopalakrishnan, PonaniPicheny, Michael Alan
    • G10L5/06G10L7/08G10L9/06G10L9/18
    • G10L19/0018G10L15/02G10L2015/025
    • A speech recognition technique utilizes a set of N different principal discriminant matrices. Each principal discriminant matrix is associated with a distinct class. The class is an indication of the proximity of a speech segment to neighboring phones. A technique for speech encoding includes arranging speech signal into a series of frames. A feature vector is derived which represents the speech signal for a speech segment or series of speech segments for each frame. A set of N different projected vectors are generated for each frame, by multiplying the principal discriminant matrices by the vector. This speech encoding technique is capable of being used in speech recognition systems by utilizing models, in which each model transition is tagged with one of the N classes. The projected vector is utilized with the corresponding tag to compute the probability that at least one particular speech port is present in said frame.
    • 语音识别技术利用一组N个不同的主判别矩阵。 每个主要判别矩阵与一个不同的类相关联。 该课堂表示语音段与邻近电话的接近程度。 用于语音编码的技术包括将语音信号布置成一系列帧。 导出特征向量,其表示用于每个帧的语音段或语音段的语音信号。 通过将主判别矩阵乘以矢量,为每个帧生成一组N个不同的投影向量。 该语音编码技术能够通过利用模型在语音识别系统中使用,其中每个模型转换被标记为N个类之一。 投影矢量与相应的标签一起使用,以计算至少一个特定语音端口存在于所述帧中的概率。
    • 77. 发明公开
    • Speech recognition apparatus
    • Vorrichtung zur Spracherkennung。
    • EP0665532A2
    • 1995-08-02
    • EP95101309.3
    • 1995-01-31
    • NEC CORPORATION
    • Yamada, Eiko, c/o NEC CorporationHattori, Hiroaki, c/o NEC Corporation
    • G10L7/08
    • G10L15/20G10L25/24G10L25/27
    • A speech data is converted into logarithmic spectrum data and orthogonally transformed to develop feature vectors. Normalization coefficient data and unit vector data are stored. An inner product of the feature vector data and the unit vector data stored is calculated. A normalization unit for regressively updating the inner product and performing spectrum normalization with a curve of the second or higher order on the feature vectors after the orthogonal transformation by using the updated inner product, the normalization coefficient data and unit vector data and the feature vector. Then a recognition is performed based on the normalized feature vector.
    • 将语音数据转换成对数频谱数据并进行正交变换以开发特征向量。 归一化系数数据和单位矢量数据被存储。 计算特征向量数据和存储的单位矢量数据的内积。 归一化单元,用于通过使用更新的内积,归一化系数数据和单位向量数据以及特征向量,在正交变换后的特征向量上用第二或更高阶的曲线退序更新内积并执行频谱归一化。 然后,基于归一化特征向量进行识别。
    • 78. 发明公开
    • Speech recognition apparatus
    • Spracherkennungsgerät。
    • EP0660300A1
    • 1995-06-28
    • EP94120541.1
    • 1994-12-23
    • NEC CORPORATION
    • Keizaburo, Takagi, c/o NEC Corporation
    • G10L5/06G10L7/08G10L9/06G10L9/18
    • G10L15/20G10L2015/0635
    • A speech recognition apparatus according to the present invention includes an average vector calculating portion (5), a compensating portion (6), and a matching portion (8). The average vector calculating portion (5) calculates an average vector for each of the noise region and the speech region of the input speech and a reference pattern received from a spectrum converting portion (4) corresponding to matching information received from a preliminary matching portion (2). The compensating portion (6) compensates the average vectors calculated by the average vector calculating portion (5) for at least one of the time sequence of the spectra of the input speech and the time sequence of the spectra of the reference pattern so that the average vector of the time sequence of the spectra of the noise region of the input speech matches with the average vector of the time sequence of the spectra of the noise region of the reference pattern and that the average vector of the time sequence of the spectra of the speech region of the input speech matches with the average vector of the time sequence of the spectra of the speech region of the reference pattern. The matching portion (8) finally matches the reference pattern with the input speech and outputs a recognition result. Since additive noise and noise conditions of the channel distortion of input speech to be recognized are quickly matched with those of a reference pattern, even if the additive noise and microphone and the transmission channel through which the input speech is collected are unknown when the input speech is trained and the additive noise and the noise conditions vary for each input speech, the speech recognition apparatus can precisely recognize speech without influenced by environmental noise. Thus, the apparatus according to the present invention can solve the drawbacks that the conventional apparatuses have had.
    • 根据本发明的语音识别装置包括平均矢量计算部分(5),补偿部分(6)和匹配部分(8)。 平均矢量计算部分(5)计算输入语音的噪声区域和语音区域中的每一个的平均矢量以及从与预备匹配部分接收的匹配信息相对应的频谱转换部分(4)接收的参考模式( 2)。 补偿部分(6)补偿由平均矢量计算部分(5)计算的平均矢量,用于输入语音的频谱的时间序列和参考图形的频谱的时间序列中的至少一个,使得平均值 输入语音的噪声区域的频谱的时间序列矢量与参考图形的噪声区域的频谱的时间序列的平均矢量匹配,并且频谱的时间序列的平均矢量 输入语音的语音区域与参考模式的语音区域的频谱的时间序列的平均矢量相匹配。 匹配部分(8)最终将参考图案与输入语音匹配,并输出识别结果。 由于要识别的输入语音的信道失真的加性噪声​​和噪声条件与参考模式的加法噪声和噪声条件快速匹配,所以即使当输入语音被收集时,加法噪声和麦克风以及收集输入语音的传输信道是未知的 对于每个输入语音,训练加性噪声和噪声条件变化,语音识别装置可以在不受环境噪声影响的情况下精确地识别语音。 因此,根据本发明的装置可以解决传统装置所具有的缺陷。
    • 79. 发明公开
    • Verfahren und Anordnung zur Ausgabe von digitalen Sprachsignalen
    • Verfahren und Anordnung zur Ausgabe von digitalen Sprachsignalen。
    • EP0637013A1
    • 1995-02-01
    • EP94110378.0
    • 1994-07-04
    • SIEMENS AKTIENGESELLSCHAFT
    • Clüver, Kai Dipl.-Ing.
    • G10L5/06G10L7/04G10L7/08G10L9/06
    • G10L19/06H04L12/6418H04L2012/5616H04L2012/5647H04L2012/5671H04L2012/6467H04L2012/6481H04Q11/0478
    • Bei Empfang eines gültigen Nachrichtenpakets nach einer Übertragung in einem Kommunikationsnetz gemäß dem Asynchronen Transfermodus (ATM) wird eine vorausschauende Analyse des eintreffenden Sprachsignals (x(n)) zur Berechnung von Filterkoeffizienten (e(i),a(i)) für ein digitales Eingangsfilter (PF) und für ein digitales Ausgangsfilter (SF) vorgenommen. Ein Differenzsignal (d(n)) wird aus dem eintreffenden Sprachsignal (x(n)) vom Eingangsfilter (PF) gebildet und als Anregungssignal (e(n)) dem Ausgangsfilter (SF) zugeführt. Das Eingangsfilter (PF) und das Ausgangsfilter (SF) weisen zueinander inverse Übertragungsfunktionen auf, so daß in Abhängigkeit der berechneten Filterkoeffizienten (e(i),a(i)) vom Ausgangsfilter (SF) das Sprachsignal (y(n)) gebildet und ausgegeben wird. Neben der vorausschauenden Analyse des eintreffenden Sprachsignals (x(n)) wird eine Analyse des Anregungssignals (e(n)) zur Ermittlung einer Periode (M) der Sprachsignalgrundfrequenz vorgenommen, anhand der unter Beibehaltung der Filterkoeffizienten (a(i)) durch grundfrequenzsynchrone Wiederholung des Anregungssignals (e(n)) beim zuletzt empfangenen Nachrichtenpaket ein Ersatzsignal ( e ( n- M)) gebildet wird. Bei Verlust eines Nachrichtenpakets wird das Ersatzsignal ( e ( n-M)) als Ausgangssignal (e(n)) dem Ausgangsfilter (SF) zugeführt, das ein substituiertes Sprachsignal (y*(n)) bildet und ausgibt.
    • 有效的数据包以异步方式发送和接收。 传输模式(ATM)。 分析输入语音信号(X(n))以计算滤波器系数。 (PF)和数字输出滤波器(SF)的(e(i),a(i))。 差分信号(d(n))从输入滤波器(PF)的输入语音信号(X(n))导出,作为刺激信号(e(n))提供给输出滤波器(SF)。 输入滤波器(PF)和输出滤波器(SF)的传递函数彼此相反,使得来自输出滤波器(SF)的语音信号(Y(n))由计算的滤波器系数形成。 (E(i)中,(I))。 在输入语音信号(X(n))的初始分析之后,分析刺激信号(e(n))以导出语音信号频率的周期(M)。 过滤器coeffts。 (a(i))通过基频同步来维持。 当接收到最后一个数据包时,形成刺激信号和替代信号(e(n-m))的重复。 如果数据分组丢失,则替代信号(e(n-m))作为输出信号(e(n))被馈送到输出滤波器,并且形成并传送替代语音信号(Y *(n))。