专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

21. 发明申请

US20130013313A1 STATISTICAL ENHANCEMENT OF SPEECH OUTPUT FROM A STATISTICAL TEXT-TO-SPEECH SYNTHESIS SYSTEM 失效
标题翻译：来自统计文字到语音合成系统的语音输出的统计增强
公开(公告)号：US20130013313A1
公开(公告)日：2013-01-10
申请号：US13177577
申请日：2011-07-07
申请人： Slava Shechtman , Alexander Sorin
发明人： Slava Shechtman , Alexander Sorin
IPC分类号： G10L13/08
CPC分类号： G10L13/033 , G10L13/06
摘要： A method, system and computer program product are provided for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors. The method includes: defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; and defining a distortion indictor of a feature vector or a plurality of feature vectors. The method further includes: receiving a feature vector output by the system; and generating an instance of the corrective transformation by: calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; and deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations. The instance of the corrective transformation may be applied to the feature vector to provide an enhanced feature vector.
摘要翻译：提供了一种方法，系统和计算机程序产品，用于增强由在声学特征向量的空间中采用语音参数表示的统计文本到语音（TTS）系统合成的语音。该方法包括：定义在声学特征向量的空间中操作并依赖于一组增强参数的校正变换的参数族; 以及定义特征向量或多个特征向量的失真指示符。该方法还包括：接收系统输出的特征向量; 以及通过以下方式产生所述校正变换的实例：计算归因于发出所述特征向量的所述语音单元的统计模型的所述失真指标的参考值; 计算归因于发射特征向量的语音单元的统计模型发射的特征向量的失真指标的实际值; 根据失真指标的参考值，失真指标的实际值和参数校正变换来计算增强参数值; 并且从校正变换的参数族导出对应于增强参数值的校正变换的实例。校正变换的实例可以应用于特征向量以提供增强的特征向量。

22. 发明授权

US07881930B2 ASR-aided transcription with segmented feedback training 有权
标题翻译： ASR辅助录入与分段反馈训练
公开(公告)号：US07881930B2
公开(公告)日：2011-02-01
申请号：US11767537
申请日：2007-06-25
申请人： Alexander Faisman , Alexander Sorin
发明人： Alexander Faisman , Alexander Sorin
IPC分类号： G10L15/26
CPC分类号： G10L15/065
摘要： An ASR-aided transcription system with segmented feedback training is provided, the system including a transcription process manager configured to extract a first segment and a second segment from an audio input of speech uttered by a speaker, and an ASR engine configured to operate in a first speech recognition mode to convert the first speech segment into a first text transcript using a speaker-independent acoustic model and a speaker-independent language model, operate in a first training mode to create a speaker-specific acoustic model and a speaker-specific language model by adapting the speaker-independent acoustic model and the speaker-independent language model using either of the first segment and a corrected version of the first text transcript, and operate in a second speech recognition mode to convert the second speech segment into a second text transcript using the speaker-specific acoustic model and the speaker-specific language model.
摘要翻译：提供了一种具有分段反馈训练的ASR辅助转录系统，该系统包括配置成从扬声器发出的语音的音频输入中提取第一段和第二段的转录处理管理器，以及被配置为在一个第一语音识别模式，使用与扬声器无关的声学模型和与扬声器无关的语言模型将第一语音段转换成第一文本记录，在第一训练模式中操作以产生特定于扬声器的声学模型和说话者特定语言通过使用第一段和第一文本转录的校正版本中的任何一个来适配与扬声器无关的声学模型和与扬声器无关的语言模型，并且以第二语音识别模式操作以将第二语音段转换为第二文本使用扬声器特定的声学模型和扬声器特定的语言模型。

23. 发明授权

US07233894B2 Low-frequency band noise detection 有权
标题翻译：低频带噪声检测
公开(公告)号：US07233894B2
公开(公告)日：2007-06-19
申请号：US10373258
申请日：2003-02-24
申请人： Alexander Sorin
发明人： Alexander Sorin
IPC分类号： G10L11/04
CPC分类号： G10L25/90 , G10L21/02 , G10L2025/937
摘要： A pitch estimation system including a low-frequency band noise detector (LBND) operative to detect the presence of low-frequency band noise in a first audio frame, a frequency-domain pitch estimator operative to calculate a pitch estimation of a second audio frame from at least one spectral peak in the second audio frame, and a pitch estimator controller operative to cause the pitch estimator to exclude from the spectrum of the second audio frame at least one low-frequency spectral peak below a predefined threshold where low-frequency band noise is present in the first audio frame.
摘要翻译：一种音调估计系统，包括可操作以检测第一音频帧中的低频带噪声的存在的低频带噪声检测器（LBND），用于计算第二音频帧的音调估计的频域俯仰估计器，所述第二音频帧中的至少一个频谱峰值以及音调估计器控制器，其操作以使所述音调估计器从所述第二音频帧的频谱中排除低于预定义阈值的至少一个低频谱峰值，其中低频带噪声存在于第一音频帧中。

24. 发明申请

US20060123361A1 Methods and systems for representing breadcrumb paths, breadcrumb inline menus and hierarchical structure in a web environment 有权
标题翻译：用于在Web环境中表示面包屑路径，面包屑内联菜单和层次结构的方法和系统
公开(公告)号：US20060123361A1
公开(公告)日：2006-06-08
申请号：US11005628
申请日：2004-12-06
申请人： Alexander Sorin , David Stephens
发明人： Alexander Sorin , David Stephens
IPC分类号： G06F3/00 , G06F17/00 , G06F17/24
CPC分类号： G06F17/30873
摘要： A computer-implemented method of representing user's navigational path in a web environment includes steps of enabling a user to successively display a plurality of pages of a web site or a web application; maintaining a full breadcrumb path that includes a link to each of the plurality of pages previously displayed to the user; displaying a shortened breadcrumb path on a currently displayed page of the plurality of pages, the shortened breadcrumb path including fewer links than the full breadcrumb path, and displaying the full breadcrumb path only when requested by the user. A computer-implemented method of representing a user's current location within a hierarchical structure of a web site or web application includes steps of maintaining a hierarchical map of the web site or web application; generating a web page that has a predetermined level within the hierarchical map; displaying a hierarchical structure selector on the web page, and when the displayed hierarchical structure selector is selected by a user, displaying an indication of the predetermined level and a link to each hierarchically higher level of the generated web page.
摘要翻译：在网络环境中表示用户导航路径的计算机实现的方法包括使用户能够连续地显示网页或web应用的多个页面的步骤; 保持包括到先前向用户显示的多个页面中的每一个的链接的完整的导航路径; 在所述多页的当前显示的页面上显示缩短的面包屑路径，所述缩短的面包屑路径包括比所述全面包屑路径更少的链接，并且仅在所述用户请求时才显示所述全面包屑路径。在网站或web应用的层次结构中表示用户当前位置的计算机实现的方法包括维护网站或web应用的分层映射的步骤; 生成在所述分层图中具有预定水平的网页; 在网页上显示层次结构选择器，并且当用户选择所显示的层次结构选择器时，显示预定级别的指示和链接到生成的网页的每个分级更高级别。

25. 发明授权

US06988064B2 System and method for combined frequency-domain and time-domain pitch extraction for speech signals 有权
标题翻译：用于语音信号的组合频域和时域音调提取的系统和方法
公开(公告)号：US06988064B2
公开(公告)日：2006-01-17
申请号：US10403792
申请日：2003-03-31
申请人： Tenkasi V. Ramabadran , Alexander Sorin
发明人： Tenkasi V. Ramabadran , Alexander Sorin
IPC分类号： G10L11/04
CPC分类号： G10L25/90
摘要： A system, computer readable medium, and method for sampling a speech signal; dividing the sampled speech signal into overlapped frames; extracting first pitch information from a frame using frequency domain analysis; providing at least one pitch candidate, each being associated with a spectral score, from the first pitch information, each of the at least one pitch candidate representing a possible pitch estimate for the frame; extracting second pitch information from the frame using a time domain analysis; providing a correlation score for the at least one pitch candidate from the second pitch information; and selecting one of the at least one pitch candidate to represent the pitch estimate of the frame. The system, computer readable medium, and method are suitable for speech coding and for distributed speech recognition.
摘要翻译：一种用于对语音信号进行采样的系统，计算机可读介质和方法; 将采样语音信号划分成重叠帧; 使用频域分析从帧中提取第一音调信息; 从所述第一音调信息提供与频谱分数相关联的至少一个音调候选者，所述至少一个音调候选中的每一个表示所述帧的可能音调估计; 使用时域分析从帧中提取第二音调信息; 从所述第二音调信息提供所述至少一个音调候选的相关得分; 以及选择所述至少一个音调候选中的一个以表示所述帧的音调估计。该系统，计算机可读介质和方法适用于语音编码和分布式语音识别。

26. 发明授权

US06961696B2 Class quantization for distributed speech recognition 有权
标题翻译：用于分布式语音识别的类量化
公开(公告)号：US06961696B2
公开(公告)日：2005-11-01
申请号：US10360582
申请日：2003-02-07
申请人： Tenkasi V. Ramabadran , Alexander Sorin
发明人： Tenkasi V. Ramabadran , Alexander Sorin
IPC分类号： G10L20060101 , G10L11/00 , G10L11/04 , G10L11/06 , G10L15/28 , G10L21/00
CPC分类号： G10L25/93 , G10L15/30 , G10L25/90 , G10L2025/935
摘要： A system, method and computer readable medium for quantizing class information and pitch information of audio is disclosed. The method on an information processing system includes receiving audio and capturing a frame of the audio. The method further includes determining a pitch of the frame and calculating a codeword representing the pitch of the frame, wherein a first codeword value indicates an indefinite pitch. The method further includes determining a class of the frame, wherein the class is any one of at least two classes indicating an indefinite pitch and at least one class indicating a definite pitch. The method further includes calculating a codeword representing the class of the frame, wherein the codeword length is the maximum of the minimum number of bits required to represent the at least two classes and the minimum number of bits required to represent the at least one class.
摘要翻译：公开了用于量化音频的类信息和音调信息的系统，方法和计算机可读介质。信息处理系统中的方法包括接收音频并捕获音频的帧。该方法还包括确定帧的音高并计算表示帧的音调的码字，其中第一码字值指示不确定音高。所述方法还包括确定所述帧的类别，其中所述类别是指示不确定音调的至少两个类别中的任何一个，以及指示确定音调的至少一个类别。所述方法还包括计算表示所述帧类别的码字，其中所述码字长度是表示所述至少两个类所需的最小比特数的最大值以及表示所述至少一个类所需的最小比特数。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式