会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Method of recognizing coherently spoken words
    • 识别相关词汇的方法
    • US5058166A
    • 1991-10-15
    • US523305
    • 1990-05-11
    • Hermann NeyAndreas Noll
    • Hermann NeyAndreas Noll
    • G10L15/00G10L15/14
    • G10L15/00
    • During the recognition, speech values which are derived from sample values of the speech signals are compared with reference values, the words of a given vocabulary each time being given by a sequence of reference values. The words are then determined from phonemes according to a fixed pronouncing lexicon and the reference values for the phonemes are determined in a learning phase, each phoneme within a word consisting of a number of equal reference values determined in the learning phase. In order to approach transitions between phonemes, each phoneme may also consist of three sections of each time constant reference values. By the given number of reference values per phoneme, the time duration of a phoneme in a given word can be simulated more accurately. Different possibilities are indicated to determine the reference values and the distance value during the recognition.
    • 在识别期间,从语音信号的采样值导出的语音值与参考值进行比较,每次由参考值序列给出给定词汇表的单词。 然后根据固定的发音词典从音素确定这些单词,并且在学习阶段中确定音素的参考值,每个音素由在学习阶段中确定的相等参考值的数量组成。 为了接近音素之间的转换,每个音素也可以由每个时间常数参考值的三个部分组成。 通过每个音素的给定数量的参考值,可以更精确地模拟给定单词中的音素的持续时间。 指示不同的可能性以确定识别期间的参考值和距离值。
    • 2. 发明授权
    • Single-count backing-off method of determining N-gram language model
values
    • 确定N-gram语言模型值的单次备份方法
    • US5745876A
    • 1998-04-28
    • US642012
    • 1996-05-02
    • Reinhard KneserHermann Ney
    • Reinhard KneserHermann Ney
    • G10L15/197G10L5/06
    • G10L15/197
    • For the recognition of coherently spoken speech with a large vocabulary, language model values which take into account the probability of word sequences are considered at word transitions. Prior to the recognition, these language model values are derived on the basis of training speech signals. If the amount of training data is kept within sensible limits, not all word sequences will actually occur, so that the language model values for, for example an N-gram language model must be determined from word sequences of N-1 words actually occurring. In accordance with the invention, these reduced word sequences from each different, complete word sequence are counted only once, irrespective of the actual frequency of occurrence of the complete word sequence or only reduced training sequences which occur exactly once in the training data are taken into account.
    • 为了识别具有较大词汇量的相干语音,考虑到字序列的概率的语言模型值在词转换中被考虑。 在识别之前,这些语言模型值是基于训练语音信号导出的。 如果训练数据的数量保持在明显的限度内,并不是所有的字序列实际上都会发生,因此,例如N-gram语言模型的语言模型值必须从实际出现的N-1个字的单词序列中确定。 根据本发明,来自每个不同的完整字序列的这些缩减的字序列仅被计数一次,而不考虑完整字序列的实际发生频率,或者仅将训练数据中正好出现一次的训练序列减少 帐户。
    • 3. 发明授权
    • Method for determining the variation with time of a speech parameter and
arrangement for carryin out the method
    • 用于确定语音参数随时间变化的方法和用于进行该方法的布置
    • US4813075A
    • 1989-03-14
    • US125101
    • 1987-11-24
    • Hermann Ney
    • Hermann Ney
    • G10L25/90G10L5/00
    • G10L25/90
    • In a speech or speaker recognition system, a segment or sequence of speech parameter values are smoothed to a most probable sequency by Dynamic Programming. The method for determining the variation with time of a speech parameter is based on a speech signal which is subdivided into successive segments and an individual value exists in each segment and for each value of the parameter within a limited range of values. For the example of the fundamental voice frequency, a value has been generated in each speech segment with the aid of the AMDF (Average Magnitude Difference Function). The required variation now links a sequency of horizontally, vertically or diagonally directly adjacent speech parameter values to one another in such a manner that the sum of the associated individual values represents a minimum. In this arrangement, this sum is slightly magnified in diagonal or vertical sections since a horizontal variation is most probable. This magnification is controlled by certain fixed values which influence the smoothness of the variation.
    • 在语音或扬声器识别系统中,通过动态编程将语音参数值的段或序列平滑到最可能的顺序。 用于确定语音参数随时间的变化的方法基于语音信号,该语音信号被细分为连续的段,并且每个段中存在单个值,并且对于参数的每个值在有限的值范围内。 对于基本语音频率的例子,借助于AMDF(平均幅度差分函数),已经在每个语音段中产生了一个值。 所需的变化现在将水平,垂直或对角线直接相邻的语音参数值的顺序相互链接,使得相关联的单独值的和表示最小。 在这种布置中,由于水平变化最可能,所以这个总和在对角线或垂直截面上略微放大。 该倍率由影响变化平滑度的某些固定值控制。
    • 4. 发明授权
    • Method of deriving characteristics values from a speech signal
    • 从语音信号中导出特征值的方法
    • US6041296A
    • 2000-03-21
    • US843808
    • 1997-04-21
    • Lutz WellingHermann Ney
    • Lutz WellingHermann Ney
    • G10L13/00G10L15/00G10L19/06G10L1/00
    • G10L19/06G10L13/00G10L15/00G10L25/06G10L25/21
    • In a frequently used speech synthesis for voice output an excitation signal is applied to a number of resonators whose frequency and amplitude are adjusted in accordance with the sound to be produced. These parameters for adjusting the resonators may be gained from natural speech signals. Such parameters gained from natural speech signals may also be used for speech recognition, in which these parameter values are compared with comparison values. According to the invention, the parameters, particularly the formant frequencies, are determined by forming the power density spectrum via discrete frequencies from which autocorrelation coefficients are formed for consecutive frequency segments of the power density spectrum from which, in turn, error values are formed, while the sum of the error values is minimized over all segments and the optimum boundary frequencies of the segments are determined for this minimum. Via the autocorrelation coefficients, the LPC predictor coefficients can then be computed, from which coefficients the formant frequency is computed. The minimum of the error sum for the individual segments is found by way of dynamic programming, in which auxiliary values are initially computed from the power density spectrum and stored as Tables from which the autocorrelation coefficients are easily determined for individual frequency segments which are required for the computations in the dynamic programming process.
    • 在经常使用的用于语音输出的语音合成中,激励信号被施加到根据要产生的声音调整其频率和幅度的多个谐振器。 用于调节谐振器的这些参数可以从自然语音信号获得。 从自然语音信号获得的这些参数也可以用于语音识别,其中这些参数值与比较值进行比较。 根据本发明,参数,特别是共振峰频率,是通过从功率密度谱的连续频段形成自相关系数的离散频率形成功率密度谱确定的,从而再次形成误差值, 而误差值的总和在所有段上最小化,并且确定段的最佳边界频率用于该最小值。 通过自相关系数,然后可以计算LPC预测器系数,从该系数计算共振峰频率。 各个段的误差和的最小值是通过动态规划来找到的,其中辅助值最初从功率密度谱计算并存储为表,对于各个频段,容易确定自相关系数, 在动态规划过程中的计算。
    • 5. 发明授权
    • Speech recognition apparatus and method using look-ahead scoring
    • 语音识别装置及使用前瞻评分的方法
    • US5956678A
    • 1999-09-21
    • US425304
    • 1995-04-17
    • Reinhold Hab-UmbachHermann Ney
    • Reinhold Hab-UmbachHermann Ney
    • G10L15/28G10L15/08G10L15/18G10L5/06
    • G10L15/08
    • In the recognition of coherently spoken words, a plurality of hypotheses is usually built up which end in various words during the recognition process and are then to be continued with further words. To keep the number of words yet to be continued as small as possible, especially in the case of a large vocabulary, it is known to carry out a look-ahead in a limited time space. It is suggested according to the invention to use the same phonemes for the look-ahead as for the actual recognition and to add together the differential sums obtained in the look-ahead for the evaluation of the partial hypothesis which has just ended and which is to be continued, and to compare this sum with a threshold value which depends on the extrapolated minimum total evaluation at the end of the time space of the look-ahead. The searching space for hypotheses to be continued can be limited by this in a particularly favorable manner.
    • 在识别连贯的口语中,通常建立多个假设,其在识别过程期间以各种单词结束,然后以更多的单词继续。 为了保持字数尽可能小,特别是在大词汇的情况下,已知在有限的时间空间内进行预览。 根据本发明,建议使用相同的音素作为实际识别的前瞻性,并且将先前获得的差分总和相加在一起,用于评估刚刚结束的部分假设,并且将 继续进行比较,并将此总和与阈值进行比较,该阈值取决于预测时间空间结束时的外推最小总评估。 用于假设的搜索空间可以以特别有利的方式受到限制。
    • 6. 发明授权
    • Measuring mis-match between signals
    • 测量信号之间的不匹配
    • US4471453A
    • 1984-09-11
    • US304294
    • 1981-09-21
    • Hermann NeyRudolf Geppert
    • Hermann NeyRudolf Geppert
    • G10L15/00G10L15/12G06F15/34
    • G10L15/12G10L15/00
    • The degree of mis-match which would be obtained between a test and a reference signal, for example speech signals, should their time-axes be subjected to that relative shift and/or distortion which is required to minimize the degree of mismatch is carried out by sampling the two signals at regular intervals and storing these samples in memories. All the samples of one signal are then read out in succession from one memory; each successive sample of the other signal is read out from the other memory, and the difference between each pair of samples is formed in a subtractor. Each difference value from the subtractor is added to the smallest of three quantities, X, Y and Z and the result stored in a register to form the new quantity X. This new quantity X is also stored in a further memory cyclically addressed in tandem with the memory 12. The quantity Z is the previous content of the location of memory into which the new quantity X is written, and the quantity Y is the previous value of the quantity Z, obtained by shifting the previous value of the quantity Z in a two-stage shift register. Thus the quantities X, Y and Z correspond to the present sample of the other signal and the immediately preceding sample of the one signal, the immediately preceding sample of both signals, and the present sample of the one signal and the immediately preceding sample of the other signal respectively. The result is that each successive value of the quantity X is the optimum cumulative distance value between the two signals for the corresponding pair of samples should these be assigned to each other when matching the two signals to each other. The process is continued until all the sampled value have been read and paired, at which point the value of the quantity X is a measure of the above-defined degree of mis-match.
    • 其测试和参考信号(例如语音信号)之间将获得的错配程度应该使其时间轴经受相应的移位和/或失真,以使不匹配的程度最小化 通过以规则的间隔采样两个信号并将这些样本存储在存储器中。 一个信号的所有样本随后从一个存储器中读出; 另一个信号的每个连续采样从另一个存储器读出,并且每对样本之间的差异形成在减法器中。 来自减法器的每个差值被加到三个量X,Y和Z中的最小值中,结果存储在寄存器中以形成新数量X.这个新数量X也存储在一个循环寻址的另外的存储器中 数量Z是将新量X写入的存储器的位置的先前内容,并且数量Y是通过将数量Z的先前值移位到数量Z中而获得的数量Z的先前值 两级移位寄存器。 因此,量X,Y和Z对应于另一信号的当前样本和一个信号的紧接在前的样本,两个信号的紧接在前的样本,以及一个信号的当前样本和 其他信号。 结果是,当将两个信号彼此匹配时,量X的每个连续值是相应的一对样本的两个信号之间的最佳累积距离值。 该过程一直持续到所有采样值已被读取和配对为止,此时数量X的值是上述不匹配程度的量度。
    • 8. 发明授权
    • Method and apparatus for recognizing spoken words in a speech signal
    • 用于识别语音信号中的口语单词的方法和装置
    • US5613034A
    • 1997-03-18
    • US312495
    • 1994-09-26
    • Hermann NeyVolker Steinbiss
    • Hermann NeyVolker Steinbiss
    • G10L15/08G10L15/197G10L9/00
    • G10L15/08G10L15/197
    • In the recognition of coherent speech, language models are favourably used to increase the reliability of recognition, which models, for example, take into account the probabilities of word combinations, especially of word pairs. For this purpose, a language model value corresponding to this probability is added at boundaries between words. In several recognition methods, for example, when the vocabulary is built up from phonemes in the shape of a tree, it is not known at the start of the continuation of a hypothesis after a word end which word will actually follow, so that a language model value cannot be taken into account until at the end of the next word. Measures are given for achieving this in such a manner that as far as possible the optimal preceding word or the optimal preceding word sequence is taken into account for the language model value without the necessity of constructing a copy of the searching tree for each and every simultaneously ending preceding word sequence.
    • 在识别相干语音时,语言模型有利地用于增加识别的可靠性,例如,该模型考虑了字组合,特别是单词对的概率。 为此,将与该概率对应的语言模型值添加到单词之间的边界。 在几种识别方法中,例如,当词汇从树形形状的音素构建时,在词结束之后继续假设开始时,将不知道哪个单词将实际遵循,从而语言 直到下一个单词的结尾,才能考虑到模型值。 给出了实现这一点的措施,即尽可能地在语言模型值中考虑到最佳的前一个单词或最优的前一个单词序列,而不需要同时构建搜索树的副本 结束前面的单词序列。
    • 10. 发明授权
    • Method and apparatus for recognizing spoken words in a speech signal by
organizing the vocabulary in the form of a tree
    • 通过以树的形式组织词汇来识别语音信号中的口语单词的方法和装置
    • US5995930A
    • 1999-11-30
    • US751377
    • 1996-11-19
    • Reinhold Hab-UmbachHermann Ney
    • Reinhold Hab-UmbachHermann Ney
    • G10L15/08G10L15/12G10L15/187G10L5/06
    • G10L15/187G10L15/08
    • A method and apparatus for processing a sequence of words in a speech signal for speech recognition. The method includes the steps of sampling, at recurrent instants, said speech signal for generating a series of test signals. Signal-by-signal matching and scoring is generated between the test signals and a series of reference signals, where each of the series of reference signals forms one of a plurality of vocabulary words arranged as a vocabulary tree. The vocabulary tree includes a root and a plurality of tree branches wherein any tree branch has a predetermined number of reference signals and is assigned to a speech element and any vocabulary word is assigned to a particular branch junction or branch end. Acoustic recombination determines both continuations of branches and the most probable partial hypotheses within a word because of the use of a vocabulary built up as a tree with branches having reference signals. At least one complete word for a particular test signal is determined, and, separately, for each completed word, there is: I) a word result formed including a word score and an aggregate score, said aggregate score derived from said word score and from a language model value assigned to a combination of said completed word and a uniform-length string of prior completed words.
    • 一种用于处理用于语音识别的语音信号中的单词序列的方法和装置。 该方法包括以下步骤:在复现时刻对用于产生一系列测试信号的所述语音信号进行采样。 在测试信号和一系列参考信号之间产生逐信号匹配和刻痕,其中每个参考信号系列中的每一个形成排列成词汇树的多个词汇表中的一个。 词汇树包括根和多个树分支,其中任何树枝具有预定数量的参考信号,并被分配给语音元素,并且任何词汇词被分配给特定的分支结或分支端。 声学重组决定了分支的延续和一个单词中最可能的部分假设,因为使用一个词汇构成一个具有参考信号的分支的树。 确定特定测试信号的至少一个完整单词,并且单独地,对于每个完成的单词,存在:I)形成的单词结果,包括单词得分和总分,所述总分从所述单词得分和 分配给所述完成词和前一完成词的均匀长度字符串的组合的语言模型值。