会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Facial feature extraction method and apparatus for a neural network
acoustic and visual speech recognition system
    • 用于神经网络声学和视觉语音识别系统的面部特征提取方法和装置
    • US5680481A
    • 1997-10-21
    • US488840
    • 1995-06-09
    • K. Venkatesh PrasadDavid G. Stork
    • K. Venkatesh PrasadDavid G. Stork
    • G06K9/00G06K9/68G06N3/04G10L15/16G10L15/24G06K9/46G06K9/32G06K9/62
    • G06K9/00281G06K9/00335G06K9/685G06N3/049G10L15/25G10L15/16
    • A facial feature extraction method and apparatus uses the variation in light intensity (gray-scale) of a frontal view of a speaker's face. The sequence of video images are sampled and quantized into a regular array of 150.times.150 pixels that naturally form a coordinate system of scan lines and pixel position along a scan line. Left and right eye areas and a mouth are located by thresholding the pixel gray-scale and finding the centroids of the three areas. The line segment joining the eye area centroids is bisected at right angle to form an axis of symmetry. A straight line through the centroid of the mouth area that is at right angle to the axis of symmetry constitutes the mouth line. Pixels along the mouth line and the axis of symmetry in the vicinity of the mouth area form a horizontal and vertical gray-scale profile, respectively. The profiles could be used as feature vectors but it is more efficient to select peaks and valleys (maximas and minimas) of the profile that correspond to the important physiological speech features such as lower and upper lip, mouth corner, and mouth area positions and pixel values and their time derivatives as visual vector components. Time derivatives are estimated by pixel position and value changes between video image frames. A speech recognition system uses the visual feature vector in combination with a concomitant acoustic vector as inputs to a time-delay neural network.
    • 面部特征提取方法和装置使用说话者脸部正视图的光强度(灰度)的变化。 视频图像的序列被采样和量化为150×150像素的规则阵列,其自然地沿着扫描线形成扫描线和像素位置的坐标系。 通过对像素灰度进行阈值定位并找到三个区域的质心来定位左眼区域和右眼区域。 连接眼睛区域重心的线段以直角平分,形成对称轴。 通过与对称轴成直角的口区域的质心的直线构成口线。 沿嘴口的像素和口区附近的对称轴分别形成水平和垂直的灰度轮廓。 轮廓可以用作特征向量,但是更有效地选择对应于重要的生理语音特征(例如下唇和上唇,嘴角和嘴区域位置和像素)的轮廓的峰和谷(最大值和最小值) 值和它们的时间导数作为视觉矢量分量。 时间导数由视频图像帧之间的像素位置和值变化来估计。 语音识别系统使用视觉特征向量与伴随的声矢量相结合,作为时间延迟神经网络的输入。
    • 3. 发明授权
    • Speaker recognition using spatiotemporal cues
    • 演讲人识别使用时空线索
    • US5625704A
    • 1997-04-29
    • US336974
    • 1994-11-10
    • K. Venkatesh Prasad
    • K. Venkatesh Prasad
    • G06T7/00G06K9/00G06T7/20G07C9/00G10L15/24
    • G10L15/25G06K9/00221G06K9/00335G07C9/00158
    • A speaker recognition method uses visual image representations of mouth movements associated with the generation of an acoustic utterance by a speaker that is the person to be recognized. No acoustic data is used and normal ambient lighting conditions are used. The method generates a spatiotemporal gray-level function representative of the spatiotemporal inner month area confined between the lips during the utterance from which a cue-block is generated that isolates the essential information from which a feature vector for recognition is generated. The feature vector includes utterance duration, maximum lip-to-lip separation, and location in time, or speed of lip movement opening, speed of lip movement closure, and a spatiotemporal area measure representative of the area enclosed between the lips during the utterance and representative of the frontal area of the oral cavity during the utterance. Experimental data shows distinct clustering in feature space for different speakers.
    • 扬声器识别方法使用与被识别的人的扬声器相关联的与声音发音相关联的口部动作的视觉图像表示。 不使用声学数据,使用正常的环境照明条件。 该方法产生代表在产生提示块的话语期间限制在嘴唇之间的时空内月区域的时空灰度函数,其隔离生成用于识别的特征向量的基本信息。 特征向量包括话语持续时间,最大唇到唇分离,以及时间上的位置,或唇部运动开口的速度,唇部运动闭合的速度,以及代表在话语期间唇部之间包围的区域的时空面积度量,以及 在言语中代表口腔正面区域。 实验数据显示不同扬声器的特征空间中的不同聚类。