会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 12. 发明申请
    • SYSTEMS AND METHODS FOR TEXT NORMALIZATION FOR TEXT TO SPEECH SYNTHESIS
    • 用于文本语音合成的文本正则化的系统和方法
    • US20100082348A1
    • 2010-04-01
    • US12240449
    • 2008-09-29
    • Kim SilvermanDevang NaikJerome BellegardaKevin Lenzo
    • Kim SilvermanDevang NaikJerome BellegardaKevin Lenzo
    • G10L13/08
    • G10L13/08
    • Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    • 提供了用于合成用于识别媒体资产的语音的算法。 可以从与媒体资产相关联的文本串选择性地合成语音。 文本字符串可以被归一化,并且其母语被确定用于获得目标音素,以便以用户熟悉的语言(例如,方言或重音)提供人声音语音。 算法可以在包括几个专用渲染引擎的系统上实现。 该系统可以是耦合到前端的后端的一部分,包括用于媒体资产和相关联的合成语音的存储器,以及用于接收和处理导致提供合成语音的请求的请求处理器。 前端可以通过网络将媒体资产和相关联的合成语音内容通信到主机耦合到其上播放媒体资产和合成语音的便携式电子设备的设备。
    • 13. 发明申请
    • SYSTEMS AND METHODS FOR CONCATENATION OF WORDS IN TEXT TO SPEECH SYNTHESIS
    • 用于语音合成的系统和方法
    • US20100082347A1
    • 2010-04-01
    • US12240433
    • 2008-09-29
    • Matthew RogersKim SilvermanDevang NaikBenjamin Rottler
    • Matthew RogersKim SilvermanDevang NaikBenjamin Rottler
    • G10L13/08
    • G10L13/08
    • Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    • 提供了用于合成用于识别媒体资产的语音的算法。 可以从与媒体资产相关联的文本串选择性地合成语音。 文本字符串可以被归一化,并且其母语被确定用于获得目标音素,以便以用户熟悉的语言(例如,方言或重音)提供人声音语音。 算法可以在包括几个专用渲染引擎的系统上实现。 该系统可以是耦合到前端的后端的一部分,包括用于媒体资产和相关联的合成语音的存储器,以及用于接收和处理导致提供合成语音的请求的请求处理器。 前端可以通过网络将媒体资产和相关联的合成语音内容通信到主机耦合到其上播放媒体资产和合成语音的便携式电子设备的设备。
    • 14. 发明申请
    • SYSTEMS AND METHODS FOR SPEECH PREPROCESSING IN TEXT TO SPEECH SYNTHESIS
    • 用于语音预处理的语音和语音合成的系统和方法
    • US20100082328A1
    • 2010-04-01
    • US12240397
    • 2008-09-29
    • Matthew RogersKim SilvermanDevang NaikKevin LenzoBenjamin Rottler
    • Matthew RogersKim SilvermanDevang NaikKevin LenzoBenjamin Rottler
    • G06F17/20G10L13/08
    • G10L13/08G06F17/275
    • Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    • 提供了用于合成用于识别媒体资产的语音的算法。 可以从与媒体资产相关联的文本串选择性地合成语音。 文本字符串可以被归一化,并且其母语被确定用于获得目标音素,以便以用户熟悉的语言(例如,方言或重音)提供人声音语音。 算法可以在包括几个专用渲染引擎的系统上实现。 该系统可以是耦合到前端的后端的一部分,包括用于媒体资产和相关联的合成语音的存储器,以及用于接收和处理导致提供合成语音的请求的请求处理器。 前端可以通过网络将媒体资产和相关联的合成语音内容通信到主机耦合到其上播放媒体资产和合成语音的便携式电子设备的设备。
    • 16. 发明申请
    • Audio user interface for computing devices
    • 用于计算设备的音频用户界面
    • US20060095848A1
    • 2006-05-04
    • US10981993
    • 2004-11-04
    • Devang Naik
    • Devang Naik
    • G06F9/00G11B27/00
    • G06F3/167G06F9/451G11B27/10G11B27/34
    • An audio user interface that generates audio prompts that help a user interact with a user interface of a device is disclosed. One aspect of the present invention pertains to techniques for providing the audio user interface by efficiently leveraging the computing resources of a host computer system. The relatively powerful computing resources of the host computer can convert text strings into audio files that are then transferred to the computing device. The host system performs the process intensive text-to-speech conversion so that a computing device, such as a hand-held device, only needs to perform the less intensive task of playing the audio file. The computing device can be, for example, a media player such as an MP3 player, a mobile phone, or a personal digital assistant.
    • 公开了一种音频用户界面,其生成帮助用户与设备的用户界面交互的音频提示。 本发明的一个方面涉及通过有效利用主计算机系统的计算资源来提供音频用户界面的技术。 主计算机相对强大的计算资源可以将文本字符串转换成音频文件,然后传输到计算设备。 主机系统执行流程密集的文本到语音转换,使得诸如手持设备的计算设备仅需要执行较少的播放音频文件的任务。 计算设备可以是例如诸如MP3播放器,移动电话或个人数字助理的媒体播放器。
    • 17. 发明授权
    • Combined dual spectral and temporal alignment method for user authentication by voice
    • 用于语音用户认证的双光谱和时间对准方法
    • US06697779B1
    • 2004-02-24
    • US09677385
    • 2000-09-29
    • Jerome BellegardaDevang NaikMatthias NeeracherKim Silverman
    • Jerome BellegardaDevang NaikMatthias NeeracherKim Silverman
    • G10L1700
    • G10L17/04
    • A method and system for training a user authentication by voice signal are described. In one embodiment, during training, a set of all spectral feature vectors for a given speaker is globally decomposed into speaker-specific decomposition units and a speaker-specific recognition unit. During recognition, spectral feature vectors are locally decomposed into speaker-specific characteristic units. The speaker-specific recognition unit is used together with selected speaker-specific characteristic units to compute a speaker-specific comparison unit. If the speaker-specific comparison unit is within a threshold limit, then the voice signal is authenticated. In addition, a speaker-specific content unit is time-aligned with selected speaker-specific characteristic units. If the alignment is within a threshold limit, then the voice signal is authenticated. In one embodiment, if both thresholds are satisfied, then the user is authenticated.
    • 描述了通过语音信号训练用户认证的方法和系统。 在一个实施例中,在训练期间,用于给定说话者的一组所有光谱特征向量被全局地分解成说话人特定的分解单元和一个说话者特定的识别单元。 在识别期间,光谱特征向量被局部分解成说话者特有的特征单元。 扬声器特定识别单元与所选的扬声器特定特征单元一起使用以计算扬声器特定比较单元。 如果说话人特定的比较单元在阈值限制内,则语音信号被认证。 此外,特定于扬声器的内容单元与所选择的特定扬声器特征单元进行时间对准。 如果对齐在阈值限制内,则语音信号被认证。 在一个实施例中,如果两个阈值都被满足,则认证用户。
    • 18. 发明授权
    • Systems and methods for text to speech synthesis
    • 文本到语音合成的系统和方法
    • US08352272B2
    • 2013-01-08
    • US12240404
    • 2008-09-29
    • Matthew RogersKim SilvermanDevang NaikKevin LenzoBenjamin Rottler
    • Matthew RogersKim SilvermanDevang NaikKevin LenzoBenjamin Rottler
    • G10L13/08
    • G10L13/00
    • Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    • 提供了用于合成用于识别媒体资产的语音的算法。 可以从与媒体资产相关联的文本串选择性地合成语音。 文本字符串可以被归一化,并且其母语被确定用于获得目标音素,以便以用户熟悉的语言(例如,方言或重音)提供人声音语音。 算法可以在包括几个专用渲染引擎的系统上实现。 该系统可以是耦合到前端的后端的一部分,包括用于媒体资产和相关联的合成语音的存储器,以及用于接收和处理导致提供合成语音的请求的请求处理器。 前端可以通过网络将媒体资产和相关联的合成语音内容通信到主机耦合到其上播放媒体资产和合成语音的便携式电子设备的设备。
    • 20. 发明申请
    • SYSTEMS AND METHODS FOR TEXT TO SPEECH SYNTHESIS
    • 用于语音合成的系统和方法
    • US20100082346A1
    • 2010-04-01
    • US12240404
    • 2008-09-29
    • Matthew RogersKim SilvermanDeVang NaikKevin LenzoBenjamin Rottler
    • Matthew RogersKim SilvermanDeVang NaikKevin LenzoBenjamin Rottler
    • G10L13/08G10L13/00G10L21/00
    • G10L13/00
    • Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    • 提供了用于合成用于识别媒体资产的语音的算法。 可以从与媒体资产相关联的文本串选择性地合成语音。 文本字符串可以被归一化,并且其母语被确定用于获得目标音素,以便以用户熟悉的语言(例如,方言或重音)提供人声音语音。 算法可以在包括几个专用渲染引擎的系统上实现。 该系统可以是耦合到前端的后端的一部分,包括用于媒体资产和相关联的合成语音的存储器,以及用于接收和处理导致提供合成语音的请求的请求处理器。 前端可以通过网络将媒体资产和相关联的合成语音内容通信到主机耦合到其上播放媒体资产和合成语音的便携式电子设备的设备。