会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 4. 发明申请
    • OCR of books by word recognition
    • OCR的书籍通过单词识别
    • US20090263019A1
    • 2009-10-22
    • US12103717
    • 2008-04-16
    • Asaf TzadokEugeniusz WALACH
    • Asaf TzadokEugeniusz WALACH
    • G06K9/34
    • G06K9/00852G06K9/6255
    • Disclosed embodiments of the invention provide automated global optimization methods and systems of OCR, tailored to each document being digitized. A document-specific database is created from an OCR scan of a document of interest, which contains an exhaustive listing of words in the document. Images of each word, taken from all the fonts encountered, are entered into the database and mapped to a corresponding textual representation. After entry of a first instance of an image of a word written in a particular font, each new occurrence of the word in that font can be quickly recognized by image processing techniques. The disclosed methods and systems may be used in conjunction with adaptive character recognition training and word recognition training of the OCR engines.
    • 本发明的公开的实施例提供针对正被数字化的每个文档的自动化全局优化方法和OCR系统。 从文档的OCR扫描创建文档特定的数据库,其中包含文档中单词的详尽列表。 从遇到的所有字体中取出的每个单词的图像都将输入到数据库中,并映射到相应的文本表示。 在以特定字体写入的单词的图像的第一实例输入之后,可以通过图像处理技术快速识别该字体中的单词的新出现。 所公开的方法和系统可以与OCR引擎的自适应字符识别训练和字识别训练结合使用。
    • 8. 发明授权
    • Adaptive OCR for books
    • 适用于OCR的书籍
    • US07480411B1
    • 2009-01-20
    • US12040946
    • 2008-03-03
    • Asaf TzadokEugeniusz Walach
    • Asaf TzadokEugeniusz Walach
    • G06K9/18
    • G06K9/03G06K9/6256
    • A system/method is presented for scanning entire books or document all at once using an adaptive process where the book or document has known fonts and unknown fonts. The known fonts are processed through a verification system where sure words and error words are determined. Both the sure words and error words are sent to OCR training where they are re-OCR'ed and repeatedly verified until they meet a predetermined quality criteria. Characters or word not meeting the predetermined quality criteria receive additional OCR training until all the characters and words pass the predetermined quality criteria. Unknown fonts are scanned and clustered together by shape. Outliers in the shapes are manually key-in. Those symbols that are manually classified go to OCR training and then to the known type optimization process.
    • 提出了一种用于使用书籍或文档已知字体和未知字体的自适应过程立即扫描整本书或文档的系统/方法。 已知字体通过验证系统进行处理,确定单词和错误字。 将确定的单词和错误词都发送到OCR培训,在那里他们被重新验证并重复验证,直到达到预定的质量标准。 不满足预定质量标准的字符或词语接收额外的OCR训练,直到所有字符和单词通过预定的质量标准。 未知的字体被扫描并通过形状聚集在一起。 形状中的异常值被手动键入。 手动分类的符号进行OCR训练,然后进行已知的类型优化过程。
    • 9. 发明授权
    • Fast key-in for machine-printed OCR-based systems
    • 快速键入机器打印的基于OCR的系统
    • US08103132B2
    • 2012-01-24
    • US12060150
    • 2008-03-31
    • Asaf TzadokEugeniusz Walach
    • Asaf TzadokEugeniusz Walach
    • G06K9/03
    • G06K9/033
    • A method for correcting results of OCR or other scanned symbols. Initially scanning and performing OCR classification on a document. Clustering character/symbol classifications resulting from the OCR based on shapes. Creating super-symbols based on at least a first difference in the shapes of the clustered characters/symbols exceeding a first threshold. A carpet of super-symbols, emphasizing localized differences in similar symbols, is displayed for analysis testing. Depending on results of analysis testing, performing one of: (1) storing the clustered symbols when the carpet of super-symbols passes all of the analysis testing; (2) creating additional super-symbols based on at least a second difference in the shapes of the clustered symbols exceeding a second threshold and returning to analysis testing when the carpet of super-symbols passes most of the analysis testing; and (3) rejecting the clustered symbols when the carpet of super-symbols fails most of the analysis testing and manually keying-in the symbols.
    • 一种校正OCR或其他扫描符号结果的方法。 最初扫描并对文档执行OCR分类。 基于形状的OCR产生的聚类字符/符号分类。 基于超过第一阈值的聚集字符/符号的形状的至少第一差异创建超符号。 显示超级符号地毯,强调类似符号的本地化差异,用于分析测试。 根据分析测试的结果,执行以下操作之一:(1)当超符号的地毯通过所有分析测试时,存储聚簇符号; (2)基于超过第二阈值的聚集符号的形状的至少第二差异创建额外的超符号,并返回到超符号的地毯何时通过大部分分析测试的分析测试; 和(3)当超符号的毯子失败大部分分析测试并手动键入符号时,拒绝聚簇符号。
    • 10. 发明授权
    • Adaptive OCR for books
    • 适用于OCR的书籍
    • US07627177B2
    • 2009-12-01
    • US12276907
    • 2008-11-24
    • Asaf TzadokEugeniusz Walach
    • Asaf TzadokEugeniusz Walach
    • G06K9/18
    • G06K9/03G06K9/6256
    • A system is presented for scanning entire books or document all at once using an adaptive process where the book or document has known fonts and unknown fonts. The known fonts are processed through a verification system where sure words and error words are determined. Both the sure words and error words are sent to OCR training where they are re-OCR'ed and repeatedly verified until they meet a predetermined quality criteria. Characters or words not meeting the predetermined quality criteria receive additional OCR training until all the characters and words pass the predetermined quality criteria. Unknown fonts are scanned and clustered together by shape. Outliers in the shapes are manually keyed-in. Those symbols that are manually classified go to OCR training and then to the known type optimization process.
    • 提供了一种用于使用自动过程扫描整本书或文档的系统,其中书或文档已知字体和未知字体。 已知字体通过验证系统进行处理,确定单词和错误字。 将确定的单词和错误词都发送到OCR培训,在那里他们被重新验证并重复验证,直到达到预定的质量标准。 不满足预定质量标准的字符或词语接收额外的OCR训练,直到所有字符和单词通过预定的质量标准。 未知的字体被扫描并通过形状聚集在一起。 形状中的异常值被手动键入。 手动分类的符号进行OCR训练,然后进行已知的类型优化过程。