专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20050075881A1 Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing 有权
标题翻译：语音标记，语音注释和可选后置处理的便携式设备的语音识别
公开(公告)号：US20050075881A1
公开(公告)日：2005-04-07
申请号：US10677174
申请日：2003-10-02
申请人： Luca Rigazio , Robert Boman , Patrick Nguyen , Jean-Claude Junqua
发明人： Luca Rigazio , Robert Boman , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L15/26 , G10L21/00
CPC分类号： G06F17/30796 , G10L15/26
摘要： A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.
摘要翻译：媒体捕获设备具有接收与媒体捕获活动紧密相关的媒体捕获活动的用户语音的音频输入。分别与媒体捕获活动相关的多个聚焦语音识别词典被存储在设备上，并且语音识别器基于所选择的一个焦点语音识别词典识别用户语音。媒体标签器使用生成的语音识别文本来标记捕获的媒体，并且媒体注释器用适合于输入到语音识别器的用户语音的样本来注释所捕获的媒体。标记和注释是基于用户语音的接收和捕获的媒体的捕获之间的紧密的时间关系。在后期处理中，注释可以转换为标签，用于使用字母对声音规则和拼写单词输入来编辑词典，或直接与语音匹配以检索所捕获的媒体。

2. 发明申请

US20050071159A1 SPEECH RECOGNIZER PERFORMANCE IN CAR AND HOME APPLICATIONS UTILIZING NOVEL MULTIPLE MICROPHONE CONFIGURATIONS 有权
标题翻译：使用新型多媒体麦克风配置的汽车和家庭应用中的语音识别器性能
公开(公告)号：US20050071159A1
公开(公告)日：2005-03-31
申请号：US10672167
申请日：2003-09-26
申请人： Robert Boman , Luca Rigazio , Brian Hanson , Rathinavelu Chengalvarayan
发明人： Robert Boman , Luca Rigazio , Brian Hanson , Rathinavelu Chengalvarayan
IPC分类号： G10L15/20 , G10L21/02 , G10L15/00
CPC分类号： G10L21/0208 , G10L15/20 , G10L2021/02166
摘要： System speakers are switched to function as sound input transducers to improve recognizer performance and to support recognizer features. A crossbar switch is selectively activated, either manually or under software control, to allow system loudspeakers to function as sound input transducers that supplement the recognition system microphone or microphone array. Using loudspeakers as “microphones” improves speech recognition in noisy environments, thus attaining better recognition performance with little added system cost. The loudspeakers, positioned in physically separate locations also provide spatial information that can be used to determine the location of the person speaking and thereby offer different functionality for different persons. Acoustic models are selected based on environmental and vehicle operating conditions and may be adapted dynamically using ambient information obtained using the loudspeakers as sound input transducers.
摘要翻译：系统扬声器切换为声音输入传感器，以提高识别器性能并支持识别器功能。手动或软件控制下有选择地激活交叉开关，以允许系统扬声器作为补充识别系统麦克风或麦克风阵列的声音输入换能器。使用扬声器作为“麦克风”可以改善嘈杂环境中的语音识别，从而获得更好的识别性能，增加系统成本。放置在物理上分开的位置的扬声器还提供空间信息，其可用于确定说话者的位置，从而为不同的人提供不同的功能。基于环境和车辆操作条件选择声学模型，并且可以使用使用扬声器获得的环境信息作为声音输入换能器动态地进行调整。

3. 发明授权

US06697778B1 Speaker verification and speaker identification based on a priori knowledge 有权
标题翻译：基于先验知识的扬声器验证和扬声器识别
公开(公告)号：US06697778B1
公开(公告)日：2004-02-24
申请号：US09610495
申请日：2000-07-05
申请人： Roland Kuhn , Olivier Thyes , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
发明人： Roland Kuhn , Olivier Thyes , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
IPC分类号： G10L1506
CPC分类号： G10L17/02
摘要： Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.
摘要翻译：扬声器空间中的客户扬声器位置用于产生用于与测试扬声器数据或测试扬声器语音模型进行比较的语音模型。扬声器空间可以使用与客户端扬声器或客户端扬声器完全分开的训练扬声器，或者由训练和客户端扬声器组合构成。还提供了基于客户端环境信息对扬声器空间的再估计，以提高客户端数据落入扬声器空间的可能性。在将客户登记到扬声器空间中，当满足预定条件时，可以获得额外的客户端语音。扬声器分配也可以在客户端注册步骤中使用。

4. 发明授权

US5946649A Esophageal speech injection noise detection and rejection 失效
标题翻译：食管语音注射噪声检测和拒绝
公开(公告)号：US5946649A
公开(公告)日：1999-08-31
申请号：US843452
申请日：1997-04-16
申请人： Hector Raul Javkin , Michael Galler , Nancy Niedzielski , Robert Boman
发明人： Hector Raul Javkin , Michael Galler , Nancy Niedzielski , Robert Boman
IPC分类号： G10L21/02 , G10L11/00 , G10L3/02
CPC分类号： G10L21/0364 , G10L2021/0575
摘要： The present invention eliminates injection noise in speech produced by esophageal speakers. A speech input signal is digitized. One copy of the digitized signal is used for analysis and the other is passed through a gain switch to an amplifier as output. A Fast Fourier Transform and a mean value of the digitized speech input signal is calculated. The Fast Fourier Transform (FFT) is passed through a morphological filter to produce a filtered spectrum. An occurrence of injection noise is detected by calculating a derivative of the filtered spectrum and determining from the mean value and the derivative a location and value of a largest peak and a second largest peak in the filtered spectrum. If the largest peak is lower in frequency than the second largest peak, and if all points above 2 KHz are less than the mean, then an occurrence of injection noise has been detected. An occurrence of silence is detected by center-clipping the filtered spectrum and determining whether there is any energy within a sliding 10 millisecond window for a predetermined amount of time. If no energy is detected within a sliding 10 millisecond window for a predetermined amount time, then an occurrence of silence has been detected. The output speech signal is passed after the occurrence of injection noise has been detected; and is blocked following an occurrence of silence.
摘要翻译：本发明消除了由食管扬声器产生的语音中的注入噪声。语音输入信号被数字化。数字化信号的一个拷贝用于分析，另一个通过增益开关作为输出通过放大器。计算快速傅立叶变换和数字化语音输入信号的平均值。快速傅里叶变换（FFT）通过形态滤波器产生滤波光谱。通过计算滤波频谱的导数并根据平均值和导数确定滤波频谱中的最大峰值和第二最大峰值的位置和值来检测注入噪声的发生。如果最大峰值频率低于第二大峰值，并且如果高于2KHz的所有点都小于平均值，则检测到发生注入噪声。通过对所滤波的频谱进行中心削波来检测静音的发生，并且在预定时间量内确定滑动10毫秒窗口内是否存在任何能量。如果在预定量时间内在滑动的10毫秒窗口内没有检测到能量，则检测到沉默的发生。输出语音信号在检测到注入噪声发生之后通过; 并发生沉默后被阻止。

5. 发明授权

US09064161B1 System and method for detecting generic items in image sequence 有权
标题翻译：用于检测图像序列中的一般项目的系统和方法
公开(公告)号：US09064161B1
公开(公告)日：2015-06-23
申请号：US11811211
申请日：2007-06-08
申请人： Robert Boman , Luis Goncalves , James Ostrowski
发明人： Robert Boman , Luis Goncalves , James Ostrowski
IPC分类号： G06K15/00 , G06K7/01 , G06K7/00
CPC分类号： G06K7/01 , G06K7/00 , G06K9/00711 , G06K9/4676 , G06K2209/17 , G07G1/0063 , G07G3/003
摘要： A system and method for detecting the presence of known or unknown objects based on visual features is disclosed. In the preferred embodiment, the system is a checkout system for detecting items of merchandise on a shopping cart. The merchandise checkout system preferably includes a feature extractor for extracting visual features from a plurality of images; a motion detector configured to detect one or more groups of the visual features present in at least two of the plurality of images; a classifier to classify each of said groups of the visual features based on one or more classification criteria, wherein each of the one or more parameters is associated with one of said groups of visual features; and an alarm configured to generate an alert if the one or more parameters for any of said groups of the visual features satisfy one or more classification criteria.
摘要翻译：公开了一种用于基于视觉特征来检测已知或未知物体的存在的系统和方法。在优选实施例中，系统是用于检测购物车上的商品的结帐系统。商品结帐系统优选地包括用于从多个图像中提取视觉特征的特征提取器; 运动检测器，被配置为检测存在于所述多个图像中的至少两个中的一个或多个视觉特征组; 基于一个或多个分类标准对视觉特征的每个所述组进行分类的分类器，其中所述一个或多个参数中的每一个与所述视觉特征组之一相关联; 以及警报器，被配置为如果所述视觉特征的所述组中的任何一个的一个或多个参数满足一个或多个分类标准，则生成警报。

6. 发明授权

US07324943B2 Voice tagging, voice annotation, and speech recognition for portable devices with optional post processing 有权
标题翻译：语音标记，语音注释和可选后置处理的便携式设备的语音识别
公开(公告)号：US07324943B2
公开(公告)日：2008-01-29
申请号：US10677174
申请日：2003-10-02
申请人： Luca Rigazio , Robert Boman , Patrick Nguyen , Jean-Claude Junqua
发明人： Luca Rigazio , Robert Boman , Patrick Nguyen , Jean-Claude Junqua
IPC分类号： G10L21/00 , H04N5/76
CPC分类号： G06F17/30796 , G10L15/26
摘要： A media capture device has an audio input receptive of user speech relating to a media capture activity in close temporal relation to the media capture activity. A plurality of focused speech recognition lexica respectively relating to media capture activities are stored on the device, and a speech recognizer recognizes the user speech based on a selected one of the focused speech recognition lexica. A media tagger tags captured media with generated speech recognition text, and a media annotator annotates the captured media with a sample of the user speech that is suitable for input to a speech recognizer. Tagging and annotating are based on close temporal relation between receipt of the user speech and capture of the captured media. Annotations may be converted to tags during post processing, employed to edit a lexicon using letter-to-sound rules and spelled word input, or matched directly to speech to retrieve captured media.
摘要翻译：媒体捕获设备具有接收与媒体捕获活动紧密相关的媒体捕获活动的用户语音的音频输入。分别与媒体捕获活动相关的多个聚焦语音识别词典被存储在设备上，并且语音识别器基于所选择的一个焦点语音识别词典识别用户语音。媒体标签器使用生成的语音识别文本来标记捕获的媒体，并且媒体注释器用适合于输入到语音识别器的用户语音的样本来注释所捕获的媒体。标记和注释是基于用户语音的接收和捕获的媒体的捕获之间的紧密的时间关系。在后期处理中，注释可以转换为标签，用于使用字母对声音规则和拼写单词输入来编辑词典，或直接与语音匹配以检索所捕获的媒体。

7. 发明申请

US20050114357A1 Collaborative media indexing system and method 审中-公开
标题翻译：协同媒体索引系统和方法
公开(公告)号：US20050114357A1
公开(公告)日：2005-05-26
申请号：US10718471
申请日：2003-11-20
申请人： Rathinavelu Chengalvarayan , Philippe Morin , Robert Boman , Ted Applebaum
发明人： Rathinavelu Chengalvarayan , Philippe Morin , Robert Boman , Ted Applebaum
IPC分类号： G06F7/00 , G06F17/30 , G11B27/034 , G11B27/10 , G11B27/30
CPC分类号： G11B27/3027 , G06F16/48 , G11B27/034 , G11B27/105
摘要： An indexing system for tagging a media stream is provided. The indexing system includes a plurality of inputs for defining at least one tag. A tagging system assigns the tag to the media stream. A tag analysis system selectively distributes tags for review and editing by members of the collaborative group. A tag database stores the tag and the media stream. Retrieval architecture can search the database using the tags.
摘要翻译：提供了用于标记媒体流的索引系统。索引系统包括用于定义至少一个标签的多个输入。标签系统将标签分配给媒体流。标签分析系统选择性地分发标签以供协作组的成员审查和编辑。标签数据库存储标签和媒体流。检索架构可以使用标签搜索数据库。

8. 发明授权

US06895257B2 Personalized agent for portable devices and cellular phone 有权
标题翻译：便携式设备和手机的个性化代理
公开(公告)号：US06895257B2
公开(公告)日：2005-05-17
申请号：US10077904
申请日：2002-02-18
申请人： Robert Boman , Kirill Stoimenov , Roland Kuhn , Jean-Claude Junqua
发明人： Robert Boman , Kirill Stoimenov , Roland Kuhn , Jean-Claude Junqua
IPC分类号： H04M1/27 , H04M1/725 , H04M3/493 , H04M3/533 , H04Q7/20
CPC分类号： H04M3/53366 , H04M1/271 , H04M1/72547 , H04M1/72552 , H04M3/4938 , H04M2201/60 , H04M2203/4536 , H04M2250/74
摘要： Personalized agent services are provided in a personal messaging device, such as a cellular telephone or personal digital assistant, through services of a speech recognizer that converts speech into text and a text-to-speech synthesizer that converts text to speech. Both recognizer and synthesizer may be server-based or locally deployed within the device. The user dictates an e-mail message which is converted to text and stored. The stored text is sent back to the user as text or as synthesized speech, to allow the user to edit the message and correct transcription errors before sending as e-mail. The system includes a summarization module that prepares short summaries of incoming e-mail and voice mail. The user may access these summaries, and retrieve and organize email and voice mail using speech commands.
摘要翻译：通过将语音转换为文本的语音识别器的服务和将文本转换为语音的文本到语音合成器，个性化代理服务被提供在诸如蜂窝电话或个人数字助理的个人消息设备中。识别器和合成器可以是基于服务器的或本地部署在设备内。用户指定一个电子邮件消息，转换为文本并存储。存储的文本作为文本或合成语音发送回用户，以允许用户在作为电子邮件发送之前编辑消息并纠正转录错误。该系统包括一个汇总模块，准备收到的电子邮件和语音邮件的简要摘要。用户可以访问这些摘要，并使用语音命令检索和组织电子邮件和语音邮件。

9. 发明申请

US20050010411A1 Speech data mining for call center management 审中-公开
标题翻译：语音数据挖掘用于呼叫中心管理
公开(公告)号：US20050010411A1
公开(公告)日：2005-01-13
申请号：US10616006
申请日：2003-07-09
申请人： Luca Rigazio , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
发明人： Luca Rigazio , Patrick Nguyen , Jean-Claude Junqua , Robert Boman
IPC分类号： G10L15/26 , G10L17/00 , G10L15/00
CPC分类号： G10L15/26 , G10L17/00
摘要： A speech data mining system for use in generating a rich transcription having utility in call center management includes a speech differentiation module differentiating between speech of interacting speakers, and a speech recognition module improving automatic recognition of speech of one speaker based on interaction with another speaker employed as a reference speaker. A transcript generation module generates a rich transcript based on recognized speech of the speakers. Focused, interactive language models improve recognition of a customer on a low quality channel using context extracted from speech of a call center operator on a high quality channel with a speech model adapted to the operator. Mined speech data includes number of interaction turns, customer frustration phrases, operator polity, interruptions, and/or contexts extracted from speech recognition results, such as topics, complaints, solutions, and resolutions. Mined speech data is useful in call center and/or product or service quality management.
摘要翻译：用于产生在呼叫中心管理中具有效用的丰富录音的语音数据挖掘系统包括区分交互式扬声器的语音的语音区分模块和改善一个扬声器的语音的自动识别的语音识别模块，作为参考发言人。转录本生成模块基于扬声器的识别语音生成丰富的录音。专注的交互式语言模型通过使用适合于操作员的语音模型，在高质量频道上从呼叫中心运营商的语音提取的上下文，改善对低质量信道上客户的识别。挖掘的语音数据包括从诸如主题，投诉，解决方案和分辨率的语音识别结果中提取的交互轮廓数量，客户沮丧短语，运营商政治，中断和/或上下文。挖掘的语音数据在呼叫中心和/或产品或服务质量管理中是有用的。

10. 发明授权

US06480819B1 Automatic search of audio channels by matching viewer-spoken words against closed-caption/audio content for interactive television 有权
标题翻译：通过将观众口语与针对交互式电视的封闭字幕/音频内容相匹配来自动搜索音频频道
公开(公告)号：US06480819B1
公开(公告)日：2002-11-12
申请号：US09258115
申请日：1999-02-25
申请人： Robert Boman , Jean-Claude Junqua
发明人： Robert Boman , Jean-Claude Junqua
IPC分类号： G06F1727
CPC分类号： G10L15/26 , G10L15/1815
摘要： A method and apparatus is provided to enable a user watching and/or listening to a program to search for new information in the stream of a telecommunications data. The apparatus includes a voice recognition system that recognizes the user's request and causes a search to be performed in the long stream of data of at least one other telecommunication channel. The system includes a storage device for storing and processing the request. Upon recognition of the request, the incoming signal or signals are scanned for matches with the request. Upon finding the match between the request and the incoming signal, information related to the data is brought to the viewer's attention. This can be accomplished by either changing the viewer's station or by bringing in a split screen display forward into the display.
摘要翻译：提供了一种方法和装置，用于使用户能够观看和/或收听节目以搜索电信数据流中的新信息。该装置包括语音识别系统，其识别用户的请求并且使得在至少另一个电信信道的长流数据中执行搜索。该系统包括用于存储和处理该请求的存储装置。一旦识别到请求，就会扫描输入信号或与该请求匹配的信号。在找到请求和输入信号之间的匹配时，与数据相关的信息被引起观众的注意。这可以通过改变观众的电台或将分屏显示向前推入显示器来实现。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式