会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明授权
    • Handwritten document categorizer and method of training
    • 手写文件分类器和培训方法
    • US08566349B2
    • 2013-10-22
    • US12567920
    • 2009-09-28
    • Francois RagnetFlorent C. PerronninThierry Lehoux
    • Francois RagnetFlorent C. PerronninThierry Lehoux
    • G06F17/30
    • G06F17/30705G06K9/00879G06K9/2054G06K9/6256G06K2209/01
    • A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.
    • 公开了一种用于训练手写文档分类器的方法和装置。 对于要分类手写文件的集合中的每个类别,根据类别标记的类型文档的训练集的OCR输出来识别歧视性词。 建立了一组关键字,其中包括为每个类别确定的某些歧视性词汇。 使用多种不同类型的字体来合成组中的每个关键字的样本。 然后,基于从合成样本中提取的特征,为每个关键字生成关键字模型,估计模型的参数。 通过将生成的关键词模型应用于从扫描的手写文档中提取的单词图像,生成按类别标记的一组扫描手写文档中的每一个的关键字统计。 分类程序使用关键字统计信息和各自的手写文档标签进行培训。
    • 3. 发明申请
    • METHOD FOR ONE-STEP DOCUMENT CATEGORIZATION AND SEPARATION
    • 一步文件分类和分离方法
    • US20110192894A1
    • 2011-08-11
    • US12702897
    • 2010-02-09
    • Francois RagnetJohn A. MooreNicolas Raphaël SaubatEric H. CheminotThierry Lehoux
    • Francois RagnetJohn A. MooreNicolas Raphaël SaubatEric H. CheminotThierry Lehoux
    • G06F17/00G06K7/00G09F3/00
    • G06F17/30563G06F17/30011
    • A method, apparatus, and hardcopy document are provided. The method provides for separating and categorizing documents and includes receiving a scanned batch of documents. The batch includes a plurality of scanned documents to which document separator stamps have been applied before scanning. Each document separator stamp includes first and second machine recognizable patterns applied on a same page of a document, the first and second patterns being spaced by a designated field for receiving a user-applied category code. The scanned batch of documents is processed to identify pages that contain a document separator, the processing including identifying at least one of the first and second spaced patterns. For each of a plurality of document pages for which a document separator is identified, the method includes locating the corresponding designated field and identifying the category code associated with the designated field. The document containing the identified separator is separated from other documents in the batch based on at least the identified separator and a document category is assigned to the document from a set of document categories, based on the identified category code.
    • 提供了一种方法,设备和硬拷贝文档。 该方法用于分离和分类文档,并包括接收扫描的文件批次。 批次包括在扫描之前已经应用了文档分离器标记的多个扫描文档。 每个文档分隔符包括应用于文档的同一页面上的第一和第二机器可识别图案,第一和第二图案间隔有用于接收用户应用类别代码的指定字段。 处理扫描的文档批次以识别包含文档分隔符的页面,该处理包括识别第一和第二间隔图案中的至少一个。 对于识别出文档分离器的多个文档页面中的每一个,该方法包括定位相应的指定字段并且识别与指定字段相关联的类别代码。 基于所识别的类别代码,至少基于所标识的分离器将包含所识别的分离符的文档与批处理中的其他文档分开,并且从文档类别集合将文档类别分配给文档。
    • 5. 发明申请
    • HANDWRITTEN DOCUMENT CATEGORIZER AND METHOD OF TRAINING
    • 手写文件分类器和培训方法
    • US20110078191A1
    • 2011-03-31
    • US12567920
    • 2009-09-28
    • Francois RAGNETFlorent C. PerronninThierry Lehoux
    • Francois RAGNETFlorent C. PerronninThierry Lehoux
    • G06F17/30
    • G06F17/30705G06K9/00879G06K9/2054G06K9/6256G06K2209/01
    • A method and an apparatus for training a handwritten document categorizer are disclosed. For each category in a set into which handwritten documents are to be categorized, discriminative words are identified from the OCR output of a training set of typed documents labeled by category. A group of keywords is established including some of the discriminative words identified for each category. Samples of each of the keywords in the group are synthesized using a plurality of different type fonts. A keyword model is then generated for each keyword, parameters of the model being estimated, at least initially, based on features extracted from the synthesized samples. Keyword statistics for each of a set of scanned handwritten documents labeled by category are generated by applying the generated keyword models to word images extracted from the scanned handwritten documents. The categorizer is trained with the keyword statistics and respective handwritten document labels.
    • 公开了一种用于训练手写文档分类器的方法和装置。 对于要分类手写文件的集合中的每个类别,根据类别标记的类型文档的训练集的OCR输出来识别歧视性词。 建立了一组关键字,其中包括为每个类别确定的某些歧视性词汇。 使用多种不同类型的字体来合成组中的每个关键字的样本。 然后,基于从合成样本中提取的特征,为每个关键字生成关键字模型,估计模型的参数。 通过将生成的关键词模型应用于从扫描的手写文档中提取的单词图像,生成按类别标记的一组扫描手写文档中的每一个的关键字统计。 分类程序使用关键字统计信息和各自的手写文档标签进行培训。