专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

121. 发明授权

US08019602B2 Automatic speech recognition learning using user corrections 有权
标题翻译：自动语音识别学习使用用户更正
公开(公告)号：US08019602B2
公开(公告)日：2011-09-13
申请号：US10761451
申请日：2004-01-20
申请人： Dong Yu , Peter Mau , Mei-Yuh Hwang , Alejandro Acero
发明人： Dong Yu , Peter Mau , Mei-Yuh Hwang , Alejandro Acero
IPC分类号： G10L15/00 , G10L15/26 , G10L21/00
CPC分类号： G10L15/065 , G10L15/063 , G10L2015/0631
摘要： An automatic speech recognition system recognizes user changes to dictated text and infers whether such changes result from the user changing his/her mind, or whether such changes are a result of a recognition error. If a recognition error is detected, the system uses the type of user correction to modify itself to reduce the chance that such recognition error will occur again. Accordingly, the system and methods provide for significant speech recognition learning with little or no additional user interaction.
摘要翻译：自动语音识别系统识别用户对规定文本的改变，并且推测这种改变是否由用户改变主意而产生，或者这些改变是否是识别错误的结果。如果检测到识别错误，则系统使用用户校正的类型进行自身修改，以减少再次发生这种识别错误的可能性。因此，该系统和方法提供了很少或没有额外的用户交互的重要语音识别学习。

122. 发明申请

US20110161078A1 PITCH MODEL FOR NOISE ESTIMATION 有权
标题翻译：用于噪声估计的PITCH模型
公开(公告)号：US20110161078A1
公开(公告)日：2011-06-30
申请号：US13042000
申请日：2011-03-07
申请人： James G. Droppo , Alejandro Acero , Luis Buera
发明人： James G. Droppo , Alejandro Acero , Luis Buera
IPC分类号： G10L15/20
CPC分类号： G10L21/02
摘要： Pitch is tracked for individual samples, which are taken much more frequently than an analysis frame. Speech is identified based on the tracked pitch and the speech components of the signal are removed with a time-varying filter, leaving only an estimate of a time-varying speech signal. This estimate is then used to generate a time-varying noise model which, in turn, can be used to enhance speech related systems.
摘要翻译：跟踪单个样本的间距，比分析框架更频繁。基于跟踪音调识别语音，并且用时变滤波器去除信号的语音分量，仅留下时变语音信号的估计。然后，该估计用于产生随时间变化的噪声模型，该模型又可用于增强语音相关系统。

123. 发明授权

US07941316B2 Combined speech and alternate input modality to a mobile device 有权
标题翻译：组合语音和交替输入模式到移动设备
公开(公告)号：US07941316B2
公开(公告)日：2011-05-10
申请号：US11262230
申请日：2005-10-28
申请人： Milind V. Mahajan , Alejandro Acero , Bo-June Hsu
发明人： Milind V. Mahajan , Alejandro Acero , Bo-June Hsu
IPC分类号： G10L15/26
CPC分类号： G10L15/22
摘要： A method of entering information into a mobile device includes receiving a multi-word speech input from a user, performing speech recognition on the speech input to obtain a multi-word speech recognition result, and sequentially displaying, in a display, words in the speech recognition result for user confirmation or correction, by adding one word at a time to the display. A next word is only displayed after user confirmation or correct has been received for a previously displayed word that is immediately preceding the next word in the speech recognition result. The method also includes calculating a hypothesis lattice indicative of a plurality of speech recognition hypotheses based on the speech input and, prior to finishing calculating the hypothesis lattice and while continuing to calculate the hypothesis lattice, calculating a preliminary hypothesis lattice indicative of only partial speech recognition hypotheses based on the speech input and outputting the preliminary hypotheses lattice.
摘要翻译：将信息输入到移动设备的方法包括从用户接收多字语音输入，在语音输入上执行语音识别以获得多字语音识别结果，并且在显示器中依次显示语音中的单词用户确认或校正的识别结果，通过一次添加一个单词到显示。仅在用户确认之后才显示下一个单词，或者在语音识别结果中紧接在下一个单词之前的先前显示的单词已经接收到正确的单词。该方法还包括基于语音输入来计算指示多个语音识别假设的假设格点，并且在完成计算假设网格之前并在继续计算假设网格的同时，计算指示仅部分语音识别的初步假设点基于语音输入的假设，并输出初步假设格。

124. 发明授权

US07877256B2 Time synchronous decoding for long-span hidden trajectory model 有权
标题翻译：长跨隐藏轨迹模型的时间同步解码
公开(公告)号：US07877256B2
公开(公告)日：2011-01-25
申请号：US11356905
申请日：2006-02-17
申请人： Xiaolong Li , Li Deng , Dong Yu , Alejandro Acero
发明人： Xiaolong Li , Li Deng , Dong Yu , Alejandro Acero
IPC分类号： G10L15/14
CPC分类号： G10L15/08
摘要： A time-synchronous lattice-constrained search algorithm is developed and used to process a linguistic model of speech that has a long-contextual-span capability. In the algorithm, hypotheses are represented as traces that include an indication of a current frame, previous frames and future frames. Each frame can include an associated linguistic unit such as a phone or units that are derived from a phone. Additionally, pruning strategies can be applied to speed up the search. Further, word-ending recombination methods are developed to speed up the computation. These methods can effectively deal with an exponentially increased search space.
摘要翻译：开发了一种时间同步的格格约束搜索算法，用于处理具有长语境跨度能力的语言语言模型。在算法中，假设被表示为包括当前帧，先前帧和未来帧的指示的迹线。每个帧可以包括相关联的语言单元，例如从电话派生的电话或单元。此外，可以应用修剪策略来加快搜索速度。此外，开发了文字重组方法以加速计算。这些方法可以有效地处理指数级增加的搜索空间。

125. 发明申请

US20100161332A1 TRAINING WIDEBAND ACOUSTIC MODELS IN THE CEPSTRAL DOMAIN USING MIXED-BANDWIDTH TRAINING DATA FOR SPEECH RECOGNITION 审中-公开
标题翻译：使用混合波段训练数据训练在CEPS领域的宽带声学模型用于语音识别
公开(公告)号：US20100161332A1
公开(公告)日：2010-06-24
申请号：US12719626
申请日：2010-03-08
申请人： Michael L. Seltzer , Alejandro Acero
发明人： Michael L. Seltzer , Alejandro Acero
IPC分类号： G10L15/06
CPC分类号： G10L15/063 , G10L15/02 , G10L25/24
摘要： A method and apparatus are provided that use narrowband data and wideband data to train a wideband acoustic model.
摘要翻译：提供了一种使用窄带数据和宽带数据来训练宽带声学模型的方法和装置。

126. 发明申请

US20100149310A1 VISUAL FEEDBACK FOR NATURAL HEAD POSITIONING 有权
标题翻译：视觉反馈自然头位置
公开(公告)号：US20100149310A1
公开(公告)日：2010-06-17
申请号：US12336534
申请日：2008-12-17
申请人： Zhengyou Zhang , Christian Huitema , Alejandro Acero
发明人： Zhengyou Zhang , Christian Huitema , Alejandro Acero
IPC分类号： H04N7/15
CPC分类号： H04N7/147 , H04N7/15 , H04N21/42203 , H04N21/4223 , H04N21/4318 , H04N21/44218 , H04N21/4788
摘要： A videoconferencing conferee may be provided with feedback on his or her location relative a local video camera by altering how remote videoconference video is displayed on a local videoconference display viewed by the conferee. The conferee's location may be tracked and the displayed remote video may be altered in accordance to the changing location of the conferee. The remote video may appear to move in directions mirroring movement of the conferee. This effect may be achieved by modeling the remote video as offset and behind a virtual portal corresponding to the display. The remote video may be displayed according to a view of the remote video through the virtual portal. As the conferee's position changes, the view through the portal changes, and the remote video changes accordingly.
摘要翻译：可以通过改变远程视频会议视频在与会者观看的本地视频会议显示器上的显示方式，来向视频会议与会者提供关于其本地摄像机的反馈。可以跟踪与会者的位置，并且可以根据与会者的不同位置改变所显示的远程视频。远程视频可能会显示为反映与会者移动的方向。可以通过将远程视频建模为偏移并且对应于显示器的虚拟门户后面来实现该效果。远程视频可以根据通过虚拟门户的远程视频的视图来显示。随着与会者的职位发生变化，通过门户的视图会发生变化，远程视频也会相应变化。

127. 发明申请

US20100076757A1 ADAPTING A COMPRESSED MODEL FOR USE IN SPEECH RECOGNITION 有权
标题翻译：适应用于语音识别的压缩模型
公开(公告)号：US20100076757A1
公开(公告)日：2010-03-25
申请号：US12235748
申请日：2008-09-23
申请人： Jinyu Li , Li Deng , Dong Yu , Jian Wu , Yifan Gong , Alejandro Acero
发明人： Jinyu Li , Li Deng , Dong Yu , Jian Wu , Yifan Gong , Alejandro Acero
IPC分类号： G10L15/20
CPC分类号： G10L15/20 , G10L15/065
摘要： A speech recognition system includes a receiver component that receives a distorted speech utterance. The speech recognition also includes an adaptor component that selectively adapts parameters of a compressed model used to recognize at least a portion of the distorted speech utterance, wherein the adaptor component selectively adapts the parameters of the compressed model based at least in part upon the received distorted speech utterance.
摘要翻译：语音识别系统包括接收失真的语音话语的接收机组件。所述语音识别还包括适配器组件，所述适配器组件选择性地适配用于识别所述失真语音话语的至少一部分的压缩模型的参数，其中所述适配器组件至少部分地基于接收失真的语音话语选择性地调整所述压缩模型的参数讲话话语。

128. 发明申请

US20090323924A1 ACOUSTIC ECHO SUPPRESSION 有权
标题翻译：呼声抑制
公开(公告)号：US20090323924A1
公开(公告)日：2009-12-31
申请号：US12145579
申请日：2008-06-25
申请人： Ivan J. Tashev , Alejandro Acero , Nilesh Madhu
发明人： Ivan J. Tashev , Alejandro Acero , Nilesh Madhu
IPC分类号： H04M9/08
CPC分类号： H04M9/082
摘要： Sound signals captured by a microphone are adjusted to provide improved sound quality. More particularly, an Acoustic Echo Reduction system which performs a first stage of echo reduction (e.g., acoustic echo cancellation) on a received signal is configured to perform a second stage of echo reduction (e.g., acoustic echo suppression) by segmenting the received signal into a plurality of frequency bins respectively comprised within a number of frames (e.g., 0.3 s to 0.5 s sound signal segments) for a given block. Data comprised within respective frequency bins is modeled according to a probability density function (e.g., Gaussian distribution). The probability of whether respective frequency bins comprise predominantly near-end signal or predominantly residual echo is calculated. The output of the acoustic echo suppression is computed as a product of the content of a frequency bin in a frame and the probability the frequency bin in a frame comprises predominantly near-end signal, thereby making near-end signals more prominent than residual echoes.
摘要翻译：由麦克风捕获的声音信号进行调整，以提高音质。更具体地，在接收信号上执行回波减少的第一阶段（例如，声学回声消除）的声学回波减少系统被配置为通过将接收到的信号分段为进行回波减少的第二阶段（例如，声学回声抑制）分别包括在给定块的多个帧（例如，0.3s至0.5s的声音信号段）内的多个频率仓。根据概率密度函数（例如，高斯分布）对包含在相应频率仓内的数据进行建模。计算各个频率仓主要包括近端信号或主要是残余回波的概率。声波回声抑制的输出被计算为帧中的频率仓的内容与帧中的频率仓主要包含近端信号的概率的乘积，从而使近端信号比残余回波更突出。

129. 发明授权

US07634406B2 System and method for identifying semantic intent from acoustic information 有权
标题翻译：用于从声学信息中识别语义意图的系统和方法
公开(公告)号：US07634406B2
公开(公告)日：2009-12-15
申请号：US11009630
申请日：2004-12-10
申请人： Xiao Li , Asela J. Gunawardana , Alejandro Acero , Milind Mahajan , Dong Yu
发明人： Xiao Li , Asela J. Gunawardana , Alejandro Acero , Milind Mahajan , Dong Yu
IPC分类号： G10L15/06
CPC分类号： G10L15/19 , G10L15/1815
摘要： In accordance with one embodiment of the present invention, unanticipated semantic intents are discovered in audio data in an unsupervised manner. For instance, the audio acoustics are clustered based on semantic intent and representative acoustics are chosen for each cluster. The human then need only listen to a small number of representative acoustics for each cluster (and possibly only one per cluster) in order to identify the unforeseen semantic intents.
摘要翻译：根据本发明的一个实施例，以无监督的方式在音频数据中发现意外的语义意图。例如，音频声学基于语义意图进行聚类，并为每个群集选择代表性的声学。然后，人们只需要听每个群集的少量代表性声学（并且可能只有一个群集），以便识别不可预见的语义意图。

130. 发明授权

US07624006B2 Conditional maximum likelihood estimation of naïve bayes probability models 有权
标题翻译：初始贝叶斯概率模型的条件最大似然估计
公开(公告)号：US07624006B2
公开(公告)日：2009-11-24
申请号：US10941399
申请日：2004-09-15
申请人： Ciprian Chelba , Alejandro Acero
发明人： Ciprian Chelba , Alejandro Acero
IPC分类号： G06F17/27 , G06F17/20 , G06F17/30
CPC分类号： G10L15/1822 , G06N7/005 , Y10S707/99936
摘要： A statistical classifier is constructed by estimating Naïve Bayes classifiers such that the conditional likelihood of class given word sequence is maximized. The classifier is constructed using a rational function growth transform implemented for Naïve Bayes classifiers. The estimation method tunes the model parameters jointly for all classes such that the classifier discriminates between the correct class and the incorrect ones for a given training sentence or utterance. Optional parameter smoothing and/or convergence speedup can be used to improve model performance. The classifier can be integrated into a speech utterance classification system or other natural language processing system.
摘要翻译：通过估计朴素贝叶斯分类器来构建统计分类器，使得给定字序列的条件似然性最大化。分类器是使用为朴素贝叶斯分类器实现的理性函数增长变换构建的。估计方法为所有类别共同调整模型参数，以便分类器对于给定的训练句或话语来区分正确的类和不正确的类。可选参数平滑和/或收敛加速可用于提高模型性能。分类器可以集成到语音语音分类系统或其他自然语言处理系统中。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式