会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 4. 发明申请
    • Document clustering
    • 文档聚类
    • US20070083368A1
    • 2007-04-12
    • US11246336
    • 2005-10-07
    • John Handley
    • John Handley
    • G10L15/06
    • G06F17/30011G06F17/3071
    • Methods and systems for clustering document collections are disclosed. A system for clustering observations may include a processor and a processor-readable storage medium. The processor-readable storage medium may contain one or more programming instructions for performing a method of clustering observations. A plurality of parameter vectors and a plurality of observations may be received. A distribution may also be determined. An optimal partitioning of the observations may then be selected based on the distribution, the parameter vectors and a likelihood function.
    • 公开了用于聚类文档集合的方法和系统。 用于聚类观察的系统可以包括处理器和处理器可读存储介质。 处理器可读存储介质可以包含用于执行聚类观察的方法的一个或多个编程指令。 可以接收多个参数向量和多个观察值。 也可以确定分配。 然后可以基于分布,参数向量和似然函数来选择观测值的最佳分割。
    • 7. 发明申请
    • Document analysis systems and methods
    • 文件分析系统和方法
    • US20070092140A1
    • 2007-04-26
    • US11254924
    • 2005-10-20
    • John Handley
    • John Handley
    • G06K9/34G06K9/36
    • G06K9/00456
    • A method embodiment herein begins by capturing a source image. The source image is segmented into first planes. The first planes can each comprise a mask plane and foreground plane combination. The binary images in the first planes are structurally analyzed to identify different regions of text, tables, handwriting, line art, equations, etc., using a document model that has information of size, shape, and spatial arrangement of possible regions. Then, the method extracts (crops out) these regions from the foreground plane to create second mask/foreground plane pairs. Thus, the method creates “second” planes from the first planes, so that a separate second plane is created for each of the regions. Next, tags are associated with each of the second planes (to create tagged mask/foreground plane pairs) and the second planes and associated tags are combined into a mixed raster content (MRC) document. Then, the MRC can be stored and/or transmitted so that the method can perform a separate recognition process (OCR, table recognition, handwriting recognition, etc.) on each of the second planes to produce tagged output.
    • 本文的方法实施例通过捕获源图像开始。 源图像被分割成第一平面。 第一平面可以各自包括掩模平面和前景平面组合。 在结构上分析第一平面中的二进制图像,以使用具有可能区域的尺寸,形状和空间布置的信息的文档模型来识别文本,表格,手写,线条艺术,方程式等的不同区域。 然后,该方法从前景平面中提取(裁剪)这些区域以创建第二掩模/前景平面对。 因此,该方法从第一平面创建“第二”平面,从而为每个区域创建单独的第二平面。 接下来,标签与每个第二平面(以创建标记的掩模/前景平面对)相关联,并且第二平面和相关联的标签被组合成混合光栅内容(MRC)文档。 然后,可以存储和/或发送MRC,使得该方法可以在每个第二平面上执行单独的识别处理(OCR,表识别,手写识别等)以产生标记的输出。