专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

101. 发明授权

US09431004B2 Variable-depth audio presentation of textual information 有权
标题翻译：文字信息的可变深度音频呈现
公开(公告)号：US09431004B2
公开(公告)日：2016-08-30
申请号：US14019102
申请日：2013-09-05
申请人： INTERNATIONAL BUSINESS MACHINES CORPORATION
发明人： Patrick J. Bohrer , Michael D. Kistler , Ramakrishnan Rajamony , Mark W. Stephenson
IPC分类号： G06F17/00 , G10L13/04 , G10L13/08 , G10L13/00 , H04L29/06 , H04N21/61
CPC分类号： G10L13/043 , G10L13/00 , G10L13/04 , G10L13/08 , H04L65/4069 , H04L65/4084 , H04L65/604 , H04N21/6125
摘要： A respective sequence of tracks of Internet content of common subject matter is queued to each of a plurality of stations, where each of the tracks of Internet content resides on a respective Internet resource in textual form. In response to receiving a sample input, snippets of each of multiple tracks queued to a selected station among the plurality of stations is transmitted for audible presentation as synthesized human speech, where each of the snippets includes only a subset of a corresponding track. Thereafter, one or more complete tracks among the multiple tracks for which snippets were previously transmitted are transmitted for audio presentation as synthesized human speech.
摘要翻译：通用主题的因特网内容的各个轨道序列被排队到多个站中的每个站点，其中互联网内容的每个轨道以文本形式驻留在相应的因特网资源上。响应于接收到样本输入，发送排队到多个站中的选定站的多个轨道中的每个轨迹的片段用于作为合成人类语音的可听呈现，其中每个片段仅包括相应轨道的子集。此后，为了作为合成人类语音的音频呈现，发送先前发送片段的多个轨道中的一个或多个完整轨道。

102. 发明授权

US09424835B2 Statistical unit selection language models based on acoustic fingerprinting 有权
标题翻译：基于声指纹的统计单位选择语言模型
公开(公告)号：US09424835B2
公开(公告)日：2016-08-23
申请号：US14850249
申请日：2015-09-10
申请人： Google Inc.
发明人： Alexander Gutkin , Javier Gonzalvo Fructuoso , Cyril Georges Luc Allauzen
IPC分类号： G10L15/08 , G10L15/06 , G10L19/018 , G10L13/08
CPC分类号： G10L15/063 , G10L13/08 , G10L19/018
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for providing statistical unit selection language modeling based on acoustic fingerprinting. The methods, systems and apparatus include the actions of obtaining a unit database of acoustic units and, for each acoustic unit, linguistic data corresponding to the acoustic unit; obtaining stored data associating each acoustic unit with (i) a corresponding acoustic fingerprint and (ii) a probability of the linguistic data corresponding to the acoustic unit occurring in a text corpus; determining that the unit database of acoustic units has been updated to include one or more new acoustic units; for each new acoustic unit in the updated unit database: generating an acoustic fingerprint for the new acoustic unit; identifying an acoustic unit that (i) has an acoustic fingerprint that is indicated as similar to the fingerprint of the new acoustic unit, and (ii) has a stored associated probability.
摘要翻译：方法，系统和装置，包括在计算机存储介质上编码的计算机程序，用于提供基于声学指纹识别的统计单位选择语言建模。方法，系统和装置包括获得单元数据库的动作，对于每个声学单元，对应于声学单元的语言数据; 获得将每个声学单元与（i）对应的声学指纹相关联的存储数据和（ii）与在文本语料库中发生的声学单元相对应的语言数据的概率; 确定声学单元的单元数据库已经被更新为包括一个或多个新的声学单元; 对于更新的单元数据库中的每个新的声学单元：为新的声学单元产生声学指纹; 识别（i）具有与新声学单元的指纹相似的声音指纹的声学单元，以及（ii）具有存储的相关概率。

103. 发明申请

US20160198234A1 SYSTEMS, COMPUTER-IMPLEMENTED METHODS, AND TANGIBLE COMPUTER-READABLE STORAGE MEDIA FOR TRANSCRIPTION ALIGNMENT 有权
标题翻译：系统，计算机实现方法和可变数据可读存储介质用于转码对齐
公开(公告)号：US20160198234A1
公开(公告)日：2016-07-07
申请号：US15071644
申请日：2016-03-16
申请人： AT&T Intellectual Property I, L.P.
发明人： Yeon-Jun KIM , David C. GIBBON , Horst J. SCHROETER
IPC分类号： H04N21/488 , G10L13/08 , H04N21/44 , G10L21/055
CPC分类号： G10L15/265 , G10L13/08 , G10L15/26 , G10L21/055 , G10L21/06 , G10L25/51 , G11B27/10 , H04N21/44004 , H04N21/4884
摘要： Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.
摘要翻译：本文公开了系统，计算机实现的方法和用于标题媒体呈现的有形的计算机可读存储介质。该方法包括从媒体呈现和媒体呈现的转录接收自动语音识别（ASR）输出。该方法包括：通过处理器选择基于ASR输出和转录的媒体呈现中的一对锚定词，并通过将转录与所选择的一对锚点之间的ASR输出对齐来产生标题。转录可以是人类产生的。选择锚点对可以基于ASR输出和转录之间的相似性阈值。在一个变体中，停止列表上常用的单词不符合锚点词。该方法包括用生成的标题输出媒体呈现。演示文稿可以是现场直播的录音。

104. 发明授权

US09361282B2 Method and device for user interface 有权
标题翻译：用户界面的方法和设备
公开(公告)号：US09361282B2
公开(公告)日：2016-06-07
申请号：US14119801
申请日：2011-05-24
申请人： Donghyun Kang
发明人： Donghyun Kang
IPC分类号： G10L13/00 , G06F17/24 , G06F3/16 , G10L13/08
CPC分类号： G06F17/24 , G06F3/167 , G10L13/08
摘要： A method for user interface according to one embodiment of the present invention comprises the steps of: displaying text on a screen; receiving a character selection command of a user who selects at least one character included in a text, receiving a speech command of a user who designates a selected range in the text including at least one character, specifying the selected range according to the character selection command and the speech command; and a step for receiving an editing command of a user for the selected range.
摘要翻译：根据本发明的一个实施例的用户界面的方法包括以下步骤：在屏幕上显示文本; 接收选择文本中包含的至少一个字符的用户的字符选择命令，接收在包括至少一个字符的文本中指定所选范围的用户的语音命令，根据字符选择命令指定所选择的范围和讲话指挥; 以及用于接收所选范围的用户的编辑命令的步骤。

105. 发明申请

US20160111089A1 VEHICLE AND CONTROL METHOD THEREOF 有权
标题翻译：车辆及其控制方法
公开(公告)号：US20160111089A1
公开(公告)日：2016-04-21
申请号：US14709139
申请日：2015-05-11
申请人： HYUNDAI MOTOR COMPANY
发明人： Hyung Jin KIM
IPC分类号： G10L15/22 , G10L15/28 , G10L15/06 , G10L15/00 , G10L15/10
CPC分类号： G10L15/22 , G10L13/08 , G10L2015/223
摘要： A vehicle of recognizing received voice based on a language set in an external apparatus includes: a communication unit configured to receive text data stored in an external apparatus; a data converter configured to convert the received text data into voice data; a speech input unit configured to receive a speech from a user; a speech recognizer configured to recognize the received speech based on a language set in the external apparatus; and a controller configured to search for voice data corresponding to the recognized speech in the converted voice data, to generate a control command including the voice data found by the controller based on the recognized speech, and to transmit the control command to the external apparatus through the communication unit.
摘要翻译：基于外部设备中设置的语言识别接收到的语音的媒体包括：通信单元，被配置为接收存储在外部设备中的文本数据; 数据转换器，被配置为将所接收的文本数据转换成语音数据; 语音输入单元，被配置为从用户接收语音; 语音识别器，被配置为基于在外部设备中设置的语言来识别所接收的语音; 以及控制器，被配置为在转换的语音数据中搜索与识别的语音相对应的语音数据，以产生包括基于识别的语音由控制器发现的语音数据的控制命令，并且通过通信单元。

106. 发明授权

US09305552B2 Systems, computer-implemented methods, and tangible computer-readable storage media for transcription alignment 有权
公开(公告)号：US09305552B2
公开(公告)日：2016-04-05
申请号：US14492616
申请日：2014-09-22
申请人： AT&T Intellectual Property I, L.P.
发明人： Yeon-Jun Kim , David C. Gibbon , Horst J. Schroeter
IPC分类号： G10L15/00 , G10L15/26 , G11B27/10 , G10L21/06
CPC分类号： G10L15/265 , G10L13/08 , G10L15/26 , G10L21/055 , G10L21/06 , G10L25/51 , G11B27/10 , H04N21/44004 , H04N21/4884
摘要： Disclosed herein are systems, computer-implemented methods, and tangible computer-readable storage media for captioning a media presentation. The method includes receiving automatic speech recognition (ASR) output from a media presentation and a transcription of the media presentation. The method includes selecting via a processor a pair of anchor words in the media presentation based on the ASR output and transcription and generating captions by aligning the transcription with the ASR output between the selected pair of anchor words. The transcription can be human-generated. Selecting pairs of anchor words can be based on a similarity threshold between the ASR output and the transcription. In one variation, commonly used words on a stop list are ineligible as anchor words. The method includes outputting the media presentation with the generated captions. The presentation can be a recording of a live event.

107. 发明授权

US09304987B2 Content creation support apparatus, method and program 有权
标题翻译：内容创建支持设备，方法和程序
公开(公告)号：US09304987B2
公开(公告)日：2016-04-05
申请号：US14301378
申请日：2014-06-11
申请人： KABUSHIKI KAISHA TOSHIBA
发明人： Kosei Fume , Masahiro Morita
IPC分类号： G10L15/00 , G06F17/27 , G10L13/08 , G10L15/26 , G10L13/033
CPC分类号： G06F17/2755 , G10L13/033 , G10L13/08 , G10L15/26
摘要： According to one embodiment, a content creation support apparatus includes a speech synthesis unit, a speech recognition unit, an extraction unit, a detection unit, a presentation unit and a selection unit. The speech synthesis unit performs a speech synthesis on a first text. The speech recognition unit performs a speech recognition on the synthesized speech to obtain a second text. The extraction unit extracts feature values by performing a morphological analysis on each of the first and second texts. The detection unit compares a first feature value of a first difference string and a second feature value of a second difference string. The presentation unit presents correction candidate(s) according to the second feature value. The selection unit selects one of the correction candidates in accordance with an instruction from a user.
摘要翻译：根据一个实施例，内容创建支持设备包括语音合成单元，语音识别单元，提取单元，检测单元，呈现单元和选择单元。语音合成单元对第一文本执行语音合成。语音识别单元对合成语音执行语音识别以获得第二文本。提取单元通过对第一和第二文本中的每一个执行形态分析来提取特征值。检测单元将第一差分字符串的第一特征值与第二差分字符串的第二特征值进行比较。呈现单元根据第二特征值呈现校正候选。选择单元根据来自用户的指令来选择一个校正候选。

108. 发明授权

US09298699B2 Presentation of written works based on character identities and attributes 有权
公开(公告)号：US09298699B2
公开(公告)日：2016-03-29
申请号：US14453269
申请日：2014-08-06
申请人： Amazon Technologies, Inc.
发明人： Peter Thomas Killalea , Janna S. Hamaker , Eugene Kalenkovich
IPC分类号： G06Q30/04 , G10L13/033 , G10L13/08 , G06F17/27
CPC分类号： G10L15/08 , G06F17/2765 , G06F17/278 , G06Q30/04 , G10L13/033 , G10L13/043 , G10L13/08 , G10L17/26 , G10L21/10
摘要： A method is provided for presenting a written work. A character identity is recognized within a written work. Presentation information for the written work, such as a graphical scheme or an electronic voice, is determined based on the character identity. The presentation information is provided to a user computing device. The user computing device renders the written work or a portion thereof using the presentation information.

109. 发明授权

US09293129B2 Speech recognition assisted evaluation on text-to-speech pronunciation issue detection 有权
标题翻译：语音识别辅助评估文本到语音发音问题检测
公开(公告)号：US09293129B2
公开(公告)日：2016-03-22
申请号：US13785573
申请日：2013-03-05
申请人： Microsoft Technology Licensing, LLC
发明人： Pei Zhao , Bo Yan , Lei He , Zhe Geng , Yiu-Ming Leung
IPC分类号： G10L13/08
CPC分类号： G10L13/086 , G10L13/08
摘要： Pronunciation issues for synthesized speech are automatically detected using human recordings as a reference within a Speech Recognition Assisted Evaluation (SRAE) framework including a Text-To-Speech flow and a Speech Recognition (SR) flow. A pronunciation issue detector evaluates results obtained at multiple levels of the TTS flow and the SR flow (e.g. phone, word, and signal level) by using the corresponding human recordings as the reference for the synthesized speech, and outputs possible pronunciation issues. A signal level may be used to determine similarities/differences between the recordings and the TTS output. A model level checker may provide results to the pronunciation issue detector to check the similarities of the TTS and the SR phone set including mapping relations. Results from a comparison of the SR output and the recordings may also be evaluation by the pronunciation issue detector. The pronunciation issue detector outputs a list that lists potential pronunciation issue candidates.
摘要翻译：在语音识别辅助评估（SRAE）框架内使用人类录音作为参考，自动检测合成语音的发音问题，包括文本到语音流和语音识别（SR）流。发音问题检测器通过使用相应的人类记录作为合成语音的参考来评估在多个级别的TTS流和SR流（例如电话，字和信号电平）上获得的结果，并输出可能的发音问题。可以使用信号电平来确定记录和TTS输出之间的相似/差异。模型级检查器可以向发音问题检测器提供结果，以检查包括映射关系的TTS和SR电话机的相似性。通过比较SR输出和记录的结果也可以由发音问题检测器进行评估。发音问题检测器输出列出潜在发音问题候选人的列表。

110. 发明授权

US09286886B2 Methods and apparatus for predicting prosody in speech synthesis 有权
标题翻译：用于预测语音合成中的韵律的方法和装置
公开(公告)号：US09286886B2
公开(公告)日：2016-03-15
申请号：US13012740
申请日：2011-01-24
申请人： Stephen Minnis , Andrew P. Breen
发明人： Stephen Minnis , Andrew P. Breen
IPC分类号： G10L13/08 , G10L13/10
CPC分类号： G10L13/10 , G10L13/08
摘要： Techniques for predicting prosody in speech synthesis may make use of a data set of example text fragments with corresponding aligned spoken audio. To predict prosody for synthesizing an input text, the input text may be compared with the data set of example text fragments to select a best matching sequence of one or more example text fragments, each example text fragment in the sequence being paired with a portion of the input text. The selected example text fragment sequence may be aligned with the input text, e.g., at the word level, such that prosody may be extracted from the audio aligned with the example text fragments, and the extracted prosody may be applied to the synthesis of the input text using the alignment between the input text and the example text fragments.
摘要翻译：用于预测语音合成中的韵律的技术可以利用具有对应的口头音频的示例文本片段的数据集。为了预测合成输入文本的韵律，可以将输入文本与示例文本片段的数据集进行比较，以选择一个或多个示例文本片段的最佳匹配序列，每个示例中的文本片段与一部分输入文本。所选择的示例文本片段序列可以与输入文本（例如，在字级别）对齐，使得可以从与示例文本片段对齐的音频中提取韵律，并且所提取的韵律可以应用于输入的合成文本使用输入文本和示例文本片段之间的对齐。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式