专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明授权

US08566349B2 Handwritten document categorizer and method of training 有权
标题翻译：手写文件分类器和培训方法
公开(公告)号：US08566349B2
公开(公告)日：2013-10-22
申请号：US12567920
申请日：2009-09-28
申请人： Francois Ragnet , Florent C. Perronnin , Thierry Lehoux
发明人： Francois Ragnet , Florent C. Perronnin , Thierry Lehoux
IPC分类号： G06F17/30
CPC分类号： G06F17/30705 , G06K9/00879 , G06K9/2054 , G06K9/6256 , G06K2209/01
摘要： A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.
摘要翻译：公开了一种用于训练手写文档分类器的方法和装置。对于要分类手写文件的集合中的每个类别，根据类别标记的类型文档的训练集的OCR输出来识别歧视性词。建立了一组关键字，其中包括为每个类别确定的某些歧视性词汇。使用多种不同类型的字体来合成组中的每个关键字的样本。然后，基于从合成样本中提取的特征，为每个关键字生成关键字模型，估计模型的参数。通过将生成的关键词模型应用于从扫描的手写文档中提取的单词图像，生成按类别标记的一组扫描手写文档中的每一个的关键字统计。分类程序使用关键字统计信息和各自的手写文档标签进行培训。

2. 发明授权

US08453922B2 Method for one-step document categorization and separation using stamped machine recognizable patterns 有权
标题翻译：使用印字机识别模式进行一步文档分类和分离的方法
公开(公告)号：US08453922B2
公开(公告)日：2013-06-04
申请号：US12702897
申请日：2010-02-09
申请人： Francois Ragnet , John A. Moore , Nicolas Raphaël Saubat , Eric H. Cheminot , Thierry Lehoux
发明人： Francois Ragnet , John A. Moore , Nicolas Raphaël Saubat , Eric H. Cheminot , Thierry Lehoux
IPC分类号： G06F17/00 , G06K19/06
CPC分类号： G06F17/30563 , G06F17/30011
摘要： A method for separating and categorizing documents includes receiving a scanned batch of documents. The batch includes scanned documents to which document separator stamps have been applied before scanning. Each stamp includes machine recognizable patterns applied on a same page of a document, spaced by a designated field for receiving a user-applied category code. The scanned batch of documents is processed to identify pages that contain a document separator, including identifying at least one of two spaced patterns. For a document page for which a document separator is identified, the the corresponding designated field is located and the category code associated with the designated field identified. The document containing the is separated from other documents in the batch based the identified separator and a document category is assigned to the document, based on the identified category code.
摘要翻译：用于分离和分类文档的方法包括接收扫描的文件批次。批次包括在扫描之前应用了文档分隔符的扫描文档。每个印章包括应用在文档的相同页面上的机器可识别图案，间隔有用于接收用户应用的类别代码的指定字段。处理扫描的文档批次以识别包含文档分隔符的页面，包括识别两个间隔图案中的至少一个。对于识别文档分隔符的文档页面，定位相应的指定字段，并且识别与指定字段相关联的类别代码。基于识别的类别代码，将包含该文档的文档与批处理中的其他文档分离，并根据识别的类别代码将文档类别分配给文档。

3. 发明申请

US20110192894A1 METHOD FOR ONE-STEP DOCUMENT CATEGORIZATION AND SEPARATION 有权
标题翻译：一步文件分类和分离方法
公开(公告)号：US20110192894A1
公开(公告)日：2011-08-11
申请号：US12702897
申请日：2010-02-09
申请人： Francois Ragnet , John A. Moore , Nicolas Raphaël Saubat , Eric H. Cheminot , Thierry Lehoux
发明人： Francois Ragnet , John A. Moore , Nicolas Raphaël Saubat , Eric H. Cheminot , Thierry Lehoux
IPC分类号： G06F17/00 , G06K7/00 , G09F3/00
CPC分类号： G06F17/30563 , G06F17/30011
摘要： A method, apparatus, and hardcopy document are provided. The method provides for separating and categorizing documents and includes receiving a scanned batch of documents. The batch includes a plurality of scanned documents to which document separator stamps have been applied before scanning. Each document separator stamp includes first and second machine recognizable patterns applied on a same page of a document, the first and second patterns being spaced by a designated field for receiving a user-applied category code. The scanned batch of documents is processed to identify pages that contain a document separator, the processing including identifying at least one of the first and second spaced patterns. For each of a plurality of document pages for which a document separator is identified, the method includes locating the corresponding designated field and identifying the category code associated with the designated field. The document containing the identified separator is separated from other documents in the batch based on at least the identified separator and a document category is assigned to the document from a set of document categories, based on the identified category code.
摘要翻译：提供了一种方法，设备和硬拷贝文档。该方法用于分离和分类文档，并包括接收扫描的文件批次。批次包括在扫描之前已经应用了文档分离器标记的多个扫描文档。每个文档分隔符包括应用于文档的同一页面上的第一和第二机器可识别图案，第一和第二图案间隔有用于接收用户应用类别代码的指定字段。处理扫描的文档批次以识别包含文档分隔符的页面，该处理包括识别第一和第二间隔图案中的至少一个。对于识别出文档分离器的多个文档页面中的每一个，该方法包括定位相应的指定字段并且识别与指定字段相关联的类别代码。基于所识别的类别代码，至少基于所标识的分离器将包含所识别的分离符的文档与批处理中的其他文档分开，并且从文档类别集合将文档类别分配给文档。

4. 发明申请

US20120033874A1 Learning weights of fonts for typed samples in handwritten keyword spotting 有权
标题翻译：学习手写关键词点样中类型样本的字体权重
公开(公告)号：US20120033874A1
公开(公告)日：2012-02-09
申请号：US12851092
申请日：2010-08-05
申请人： Florent Perronnin , Thierry Lehoux , Francois Ragnet
发明人： Florent Perronnin , Thierry Lehoux , Francois Ragnet
IPC分类号： G06K9/00 , G06K9/62
CPC分类号： G06K9/00879 , G06K9/6255 , G06K9/6828
摘要： A wordspotting system and method are disclosed. The method includes receiving a keyword and, for each of a set of typographical fonts, synthesizing a word image based on the keyword. A keyword model is trained based on the synthesized word images and the respective weights for each of the set of typographical fonts. Using the trained keyword model, handwritten word images of a collection of handwritten word images which match the keyword are identified. The weights allow a large set of fonts to be considered, with the weights indicating the relative relevance of each font for modeling a set of handwritten word images.
摘要翻译：公开了一种wordspotting系统和方法。该方法包括接收关键字，并且针对一组排版字体中的每一个，基于关键字合成单词图像。基于合成的单词图像和每组排版字体的相应权重来训练关键词模型。使用经过训练的关键词模型，识别与该关键词匹配的手写词图像集合的手写词图像。权重允许考虑一大堆字体，权重指示每个字体的相对相关性，用于对一组手写字图像进行建模。

5. 发明申请

US20110078191A1 HANDWRITTEN DOCUMENT CATEGORIZER AND METHOD OF TRAINING 有权
标题翻译：手写文件分类器和培训方法
公开(公告)号：US20110078191A1
公开(公告)日：2011-03-31
申请号：US12567920
申请日：2009-09-28
申请人： Francois RAGNET , Florent C. Perronnin , Thierry Lehoux
发明人： Francois RAGNET , Florent C. Perronnin , Thierry Lehoux
IPC分类号： G06F17/30
CPC分类号： G06F17/30705 , G06K9/00879 , G06K9/2054 , G06K9/6256 , G06K2209/01
摘要： A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.
摘要翻译：公开了一种用于训练手写文档分类器的方法和装置。对于要分类手写文件的集合中的每个类别，根据类别标记的类型文档的训练集的OCR输出来识别歧视性词。建立了一组关键字，其中包括为每个类别确定的某些歧视性词汇。使用多种不同类型的字体来合成组中的每个关键字的样本。然后，基于从合成样本中提取的特征，为每个关键字生成关键字模型，估计模型的参数。通过将生成的关键词模型应用于从扫描的手写文档中提取的单词图像，生成按类别标记的一组扫描手写文档中的每一个的关键字统计。分类程序使用关键字统计信息和各自的手写文档标签进行培训。

6. 发明授权

US08509537B2 Learning weights of fonts for typed samples in handwritten keyword spotting 有权
标题翻译：学习手写关键词点样中类型样本的字体权重
公开(公告)号：US08509537B2
公开(公告)日：2013-08-13
申请号：US12851092
申请日：2010-08-05
申请人： Florent C. Perronnin , Thierry Lehoux , Francois Ragnet
发明人： Florent C. Perronnin , Thierry Lehoux , Francois Ragnet
IPC分类号： G06K9/18 , G06K9/62 , G06K9/72
CPC分类号： G06K9/00879 , G06K9/6255 , G06K9/6828
摘要： A wordspotting system and method are disclosed. The method includes receiving a keyword and, for each of a set of typographical fonts, synthesizing a word image based on the keyword. A keyword model is trained based on the synthesized word images and the respective weights for each of the set of typographical fonts. Using the trained keyword model, handwritten word images of a collection of handwritten word images which match the keyword are identified. The weights allow a large set of fonts to be considered, with the weights indicating the relative relevance of each font for modeling a set of handwritten word images.
摘要翻译：公开了一种wordspotting系统和方法。该方法包括接收关键字，并且针对一组排版字体中的每一个，基于关键字合成单词图像。基于合成的单词图像和每组排版字体的相应权重来训练关键词模型。使用经过训练的关键词模型，识别与该关键词匹配的手写词图像集合的手写词图像。权重允许考虑一大堆字体，权重指示每个字体的相对相关性，用于对一组手写字图像进行建模。

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式