专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

US20070143107A1 Remote tracing and debugging of automatic speech recognition servers by speech reconstruction from cepstra and pitch information 有权
标题翻译：通过cepstra和音高信息的语音重建，对语音识别服务器进行远程跟踪和调试
公开(公告)号：US20070143107A1
公开(公告)日：2007-06-21
申请号：US11311753
申请日：2005-12-19
申请人： Shay Ben-David , Baiju Mandalia , Zohar Sivan , Alexander Sorin
发明人： Shay Ben-David , Baiju Mandalia , Zohar Sivan , Alexander Sorin
IPC分类号： G10L19/14
CPC分类号： G10L15/30 , G10L15/063 , G10L25/24
摘要： Methods and systems are provided for remote tuning and debugging of an automatic speech recognition system. Trace files are generated on-site from input speech by efficient, lossless compression of MFCC data, which is merged with compressed pitch and voicing information and stored as trace files. The trace files are transferred to a remote site where human-intelligible speech is reconstructed and analyzed. Based on the analysis, parameters of the automatic speech recognition system are remotely adjusted.
摘要翻译：提供了用于远程调谐和调试自动语音识别系统的方法和系统。跟踪文件通过MFCC数据的有效无损压缩从输入语音现场生成，并将其与压缩音调和语音信息相结合并存储为跟踪文件。跟踪文件被传送到远程站点，在这里可以重构和分析人类可理解的语音。基于分析，自动语音识别系统的参数进行了远程调整。

2. 发明授权

US07783488B2 Remote tracing and debugging of automatic speech recognition servers by speech reconstruction from cepstra and pitch information 有权
标题翻译：通过cepstra和音高信息的语音重建，对语音识别服务器进行远程跟踪和调试
公开(公告)号：US07783488B2
公开(公告)日：2010-08-24
申请号：US11311753
申请日：2005-12-19
申请人： Shay Ben-David , Baiju Dhirajlal Mandalia , Zohar Sivan , Alexander Sorin
发明人： Shay Ben-David , Baiju Dhirajlal Mandalia , Zohar Sivan , Alexander Sorin
IPC分类号： G06K9/00 , G06F17/27 , G10L15/00 , G10L21/00
CPC分类号： G10L15/30 , G10L15/063 , G10L25/24
摘要： Methods and systems are provided for remote tuning and debugging of an automatic speech recognition system. Trace files are generated on-site from input speech by efficient, lossless compression of MFCC data, which is merged with compressed pitch and voicing information and stored as trace files. The trace files are transferred to a remote site where human-intelligible speech is reconstructed and analyzed. Based on the analysis, parameters of the automatic speech recognition system are remotely adjusted.
摘要翻译：提供了用于远程调谐和调试自动语音识别系统的方法和系统。跟踪文件通过MFCC数据的有效无损压缩从输入语音现场生成，并将其与压缩音调和语音信息相结合并存储为跟踪文件。跟踪文件被传送到远程站点，在这里可以重构和分析人类可理解的语音。基于分析，自动语音识别系统的参数进行了远程调整。

3. 发明授权

US10134009B2 Methods and systems of providing supplemental informaton 有权
公开(公告)号：US10134009B2
公开(公告)日：2018-11-20
申请号：US13802113
申请日：2013-03-13
申请人： Alexander Sorin , David Siegel , Michael Thompson , Julian Gosper
发明人： Alexander Sorin , David Siegel , Michael Thompson , Julian Gosper
IPC分类号： G06Q10/10
摘要： At least one analytical operation from a set of different analytical operations may be determined based on at least one input. The input(s) may comprise contextual information of working content being displayed to a user on a device and comprising numerical data. Supplemental information for the working content may be generated using the determined analytical operation(s), may comprise a numerical-based analysis of the numerical data, and may be caused to be displayed to the user concurrently with the working content. The contextual information may comprise structured data. The input(s) may further comprise at least one of a history of the user's interactions with the working content, a history of the user's interactions with recommendations of supplemental information for the working content, a history of other users' interactions with the working content, and a history of other users' interactions with recommendations of supplemental information for the working content.

4. 发明申请

US20140281846A1 METHODS AND SYSTEMS OF PROVIDING SUPPLEMENTAL INFORMATON 审中-公开
标题翻译：提供补充信息的方法和系统
公开(公告)号：US20140281846A1
公开(公告)日：2014-09-18
申请号：US13802113
申请日：2013-03-13
申请人： Alexander Sorin , David Siegel , Michael Thompson , Julian Gosper
发明人： Alexander Sorin , David Siegel , Michael Thompson , Julian Gosper
IPC分类号： G06F17/24
CPC分类号： G06Q10/105
摘要： At least one analytical operation from a set of different analytical operations may be determined based on at least one input. The input(s) may comprise contextual information of working content being displayed to a user on a device and comprising numerical data. Supplemental information for the working content may be generated using the determined analytical operation(s), may comprise a numerical-based analysis of the numerical data, and may be caused to be displayed to the user concurrently with the working content. The contextual information may comprise structured data. The input(s) may further comprise at least one of a history of the user's interactions with the working content, a history of the user's interactions with recommendations of supplemental information for the working content, a history of other users' interactions with the working content, and a history of other users' interactions with recommendations of supplemental information for the working content.
摘要翻译：可以基于至少一个输入来确定来自一组不同分析操作的至少一个分析操作。输入可以包括在设备上向用户显示的工作内容的上下文信息，并且包括数字数据。可以使用确定的分析操作来生成用于工作内容的补充信息，可以包括数值数据的基于数值的分析，并且可以使其与工作内容同时显示给用户。上下文信息可以包括结构化数据。输入可以进一步包括用户与工作内容的交互的历史中的至少一个，用户与工作内容的补充信息的建议的交互的历史，其他用户与工作内容的交互的历史，以及其他用户与工作内容的补充信息建议的交互历史。

5. 发明申请

US20050131680A1 Speech synthesis using complex spectral modeling 有权
标题翻译：使用复谱谱建模的语音合成
公开(公告)号：US20050131680A1
公开(公告)日：2005-06-16
申请号：US11046911
申请日：2005-01-31
申请人： Dan Chazan , Ron Hoory , Zvi Kons , Slava Shechtman , Alexander Sorin
发明人： Dan Chazan , Ron Hoory , Zvi Kons , Slava Shechtman , Alexander Sorin
IPC分类号： G10L11/00 , G10L11/04 , G10L13/08 , G10L19/02 , G10L19/14
CPC分类号： G10L13/08 , G10L19/02
摘要： A method for processing a speech signal includes dividing the speech signal into a succession of frames, identifying one or more of the frames as click frames, and extracting phase information from the click frames. The speech signal is encoded using the phase information. Methods are also provided for modeling phase spectra of voiced frames and click frames.
摘要翻译：一种用于处理语音信号的方法包括将语音信号划分成一系列帧，将一个或多个帧标识为点击帧，以及从点击帧中提取相位信息。使用相位信息对语音信号进行编码。还提供了用于建模有声帧和点击帧的相位谱建模的方法。

6. 发明授权

US09484045B2 System and method for automatic prediction of speech suitability for statistical modeling 有权
标题翻译：自动预测语音适用性的统计建模系统和方法
公开(公告)号：US09484045B2
公开(公告)日：2016-11-01
申请号：US13606618
申请日：2012-09-07
申请人： Alexander Sorin , Slava Shechtman , Vincent Pollet
发明人： Alexander Sorin , Slava Shechtman , Vincent Pollet
IPC分类号： G10L13/06 , G10L13/04 , G10L19/00 , G10L25/48 , G10L25/18
CPC分类号： G10L25/48 , G10L13/04 , G10L25/18
摘要： An embodiment according to the invention provides a capability of automatically predicting how favorable a given speech signal is for statistical modeling, which is advantageous in a variety of different contexts. In Multi-Form Segment (MFS) synthesis, for example, an embodiment according to the invention uses prediction capability to provide an automatic acoustic driven template versus model decision maker with an output quality that is high, stable and depends gradually on the system footprint. In speaker selection for a statistical Text-to-Speech synthesis (TTS) system build, as another example context, an embodiment according to the invention enables a fast selection of the most appropriate speaker among several available ones for the full voice dataset recording and preparation, based on a small amount of recorded speech material.
摘要翻译：根据本发明的实施例提供了一种自动预测给定语音信号对于统计建模有利的能力，这在各种不同的上下文中是有利的。在多格段（MFS）合成中，例如，根据本发明的实施例使用预测能力来提供具有高，稳定的输出质量的自动声驱动模板与模型决策者，并逐渐依赖于系统占用。在用于统计文本到语音合成（TTS）系统构建的说话者选择中，作为另一示例性上下文，根据本发明的实施例使得能够在完整语音数据集记录和准备中的几个可用的扬声器中快速选择最合适的说话者，基于少量的录音材料。

7. 发明授权

US08280724B2 Speech synthesis using complex spectral modeling 有权
标题翻译：使用复谱谱建模的语音合成
公开(公告)号：US08280724B2
公开(公告)日：2012-10-02
申请号：US11046911
申请日：2005-01-31
申请人： Dan Chazan , Ron Hoory , Zvi Kons , Slava Shechtman , Alexander Sorin
发明人： Dan Chazan , Ron Hoory , Zvi Kons , Slava Shechtman , Alexander Sorin
IPC分类号： G10L11/04 , G10L19/14 , G10L11/06 , G10L19/06
CPC分类号： G10L13/08 , G10L19/02
摘要： A method for processing a speech signal includes dividing the speech signal into a succession of frames, identifying one or more of the frames as click frames, and extracting phase information from the click frames. The speech signal is encoded using the phase information. Methods are also provided for modeling phase spectra of voiced frames and click frames.
摘要翻译：一种用于处理语音信号的方法包括将语音信号划分成一系列帧，将一个或多个帧标识为点击帧，以及从点击帧中提取相位信息。使用相位信息对语音信号进行编码。还提供了用于建模有声帧和点击帧的相位谱建模的方法。

8. 发明授权

US07305339B2 Restoration of high-order Mel Frequency Cepstral Coefficients 有权
标题翻译：高阶梅尔频率倒谱系数的恢复
公开(公告)号：US07305339B2
公开(公告)日：2007-12-04
申请号：US10405733
申请日：2003-04-01
申请人： Alexander Sorin
发明人： Alexander Sorin
IPC分类号： G10L15/00
CPC分类号： G10L15/02 , G10L25/24
摘要： A method for estimating high-order Mel Frequency Cepstral Coefficients, the method comprising initializing any of N−L high-order coefficients (HOC) of an MFCC vector of length N having L low-order coefficients (LOC) to a predetermined value, thereby forming a candidate MFCC vector, synthesizing a speech signal frame from the candidate MFCC vector and a pitch value, and computing an N-dimensional MFCC vector from the synthesized frame, thereby producing an output MFCC vector.
摘要翻译：一种用于估计高阶梅尔频率倒谱系数的方法，所述方法包括将具有L个低阶系数（LOC）的长度为N的MFCC矢量的NL高阶系数（HOC）初始化为预定值，从而形成候选MFCC矢量，从候选MFCC矢量合成语音信号帧和音调值，并从合成帧计算N维MFCC矢量，从而产生输出MFCC矢量。

9. 发明授权

US06487311B1 OCR-based image compression 有权
标题翻译：基于OCR的图像压缩
公开(公告)号：US06487311B1
公开(公告)日：2002-11-26
申请号：US09304861
申请日：1999-05-04
申请人： Yaniv Gal , Alexander Sorin , Andrei Heilper , Eugene Wallach
发明人： Yaniv Gal , Alexander Sorin , Andrei Heilper , Eugene Wallach
IPC分类号： G06K968
CPC分类号： H04N1/4115
摘要： A method for compressing a digitized image of a document using optical character recognition (OCR). The method includes performing optical character recognition (OCR) on the digitized image, identifying, based, at least in part, on a result of the performing step, a plurality of classes of characters comprised in the image, each the class of characters having an associated character value and comprising at least one character, pruning each class of characters, thereby producing information describing the plurality of classes of characters and a residual image, and utilizing the information describing the plurality of classes of characters and the residual image as a compressed digitized image in further processing. Related methods and apparatus are also disclosed.
摘要翻译：一种使用光学字符识别（OCR）压缩文档的数字化图像的方法。所述方法包括对所述数字化图像执行光学字符识别（OCR），至少部分地基于所述执行步骤的结果识别所述图像中包含的多个字符类别，每个所述字符类具有相关联的字符值并且包括至少一个字符，修剪每个类别的字符，从而产生描述多个字符类别和残留图像的信息，并且利用描述多个类别的字符的信息和残差图像作为压缩数字化图像进一步处理。还公开了相关方法和装置。

10. 发明授权

US08682670B2 Statistical enhancement of speech output from a statistical text-to-speech synthesis system 失效
标题翻译：从统计文本到语音合成系统的语音输出的统计增强
公开(公告)号：US08682670B2
公开(公告)日：2014-03-25
申请号：US13177577
申请日：2011-07-07
申请人： Slava Shechtman , Alexander Sorin
发明人： Slava Shechtman , Alexander Sorin
IPC分类号： G10L13/00
CPC分类号： G10L13/033 , G10L13/06
摘要： A method, system and computer program product are provided for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors. The method includes: defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; and defining a distortion indictor of a feature vector or a plurality of feature vectors. The method further includes: receiving a feature vector output by the system; and generating an instance of the corrective transformation by: calculating a reference value of the distortion indicator attributed to a statistical model of the phonetic unit emitting the feature vector; calculating an actual value of the distortion indicator attributed to feature vectors emitted by the statistical model of the phonetic unit emitting the feature vector; calculating the enhancing parameter values depending on the reference value of the distortion indicator, the actual value of the distortion indicator and the parametric corrective transformation; and deriving an instance of the corrective transformation corresponding to the enhancing parameter values from the parametric family of the corrective transformations. The instance of the corrective transformation may be applied to the feature vector to provide an enhanced feature vector.
摘要翻译：提供了一种方法，系统和计算机程序产品，用于增强由在声学特征向量的空间中采用语音参数表示的统计文本到语音（TTS）系统合成的语音。该方法包括：定义在声学特征向量的空间中操作并依赖于一组增强参数的校正变换的参数族; 以及定义特征向量或多个特征向量的失真指示符。该方法还包括：接收系统输出的特征向量; 以及通过以下方式产生所述校正变换的实例：计算归因于发出所述特征向量的所述语音单元的统计模型的所述失真指标的参考值; 计算归因于发射特征向量的语音单元的统计模型发射的特征向量的失真指标的实际值; 根据失真指标的参考值，失真指标的实际值和参数校正变换来计算增强参数值; 并且从校正变换的参数族导出对应于增强参数值的校正变换的实例。校正变换的实例可以应用于特征向量以提供增强的特征向量。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式