会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明公开
    • Automatic categorization of documents using document signatures
    • 自动分类冯Dokumenten mit Dokumentenunterschriften
    • EP1096391A2
    • 2001-05-02
    • EP00118884.6
    • 2000-08-31
    • Hewlett-Packard Company, A Delaware Corporation
    • Shmueli, OdedElad, MichaelGreig, DarrylStaelin, Carl
    • G06F17/30
    • G06F17/3071G06K9/00463Y10S707/99933Y10S707/99936Y10S707/99942
    • A method of quickly and automatically comparing a new document to a large number of previously seen documents and identifying the document type. First, provide a plurality of document type distributions, each document type distribution describes layout characteristics of an independent document type and may include a plurality of data points. Each document type distribution includes data derived from at least one basis document signature which may include data defining pixels of a low-resolution image of the independent basis document resolved to between 1 and 75 dots per inch or may include document segmentation data derived from the independent basis document. Next provide a new electronic document. Then create new document signature from the new electronic document. Next, distances between the new document signature and each of the plurality of document type distributions are calculated using an algorithm based on a Bayesian framework for a Gaussian distribution. The distances calculated may be Euclidean distances or may be Mahalanobis distances. Additionally, calculating the distances may include weighting the value given each of a plurality of data points in the document signatures based on the usefulness of each of the plurality of data points in distinguishing between the document signatures. Next, select at least one candidate document type for the new electronic document from among the independent document types described by the plurality of document type distributions. The selection of the at least one candidate document type may include selecting a preselected fixed number of the independent document types or may include selecting the independent document types described by those of the plurality of document type distributions having calculated distances that are within a preselected threshold distance of the smallest of the distances calculated. In addition, the invention provides for a program storage medium readable by computer, tangibly embodying a program of instructions executable by the computer to perform the method steps described above.
    • 一种快速自动地将新文档与大量先前查看的文档进行比较并识别文档类型的方法。 首先,提供多个文档类型分发,每个文档类型分布描述独立文档类型的布局特征,并且可以包括多个数据点。 每个文档类型分布包括从至少一个基本文档签名导出的数据,其可以包括分辨为每英寸1至75个点之间的独立基础文档的低分辨率图像的像素的数据,或者可以包括从独立基准文档导出的文档分割数据 基础文件。 接下来提供一个新的电子文档。 然后从新的电子文档中创建新的文档签名。 接下来,使用基于用于高斯分布的贝叶斯框架的算法来计算新文档签名与多个文档类型分布中的每一个之间的距离。 计算的距离可以是欧氏距离,也可以是马氏距离。 另外,计算距离可以包括基于在区分文档签名的多个数据点中的每个数据点的有用性来对文档签名中的多个数据点中的每一个进行加权的值进行加权。 接下来,从由多个文档类型分布描述的独立文档类型中选择用于新的电子文档的至少一个候选文档类型。 所述至少一个候选文档类型的选择可以包括选择独立文档类型的预先选定的固定数量,或者可以包括选择由具有计算的距离的多个文档类型分布中描述的独立文档类型所描述的独立文档类型,该距离在预选的阈值距离 计算的最小距离。 此外,本发明提供了一种可由计算机读取的程序存储介质,其有形地体现了可由计算机执行以执行上述方法步骤的指令程序。
    • 2. 发明公开
    • Automatic categorization of documents using document signatures
    • 与文件签名自动归类文档
    • EP1096391A3
    • 2004-05-26
    • EP00118884.6
    • 2000-08-31
    • Hewlett-Packard Company, A Delaware Corporation
    • Shmueli, OdedElad, MichaelGreig, DarrylStaelin, Carl
    • G06F17/30
    • G06F17/3071G06K9/00463Y10S707/99933Y10S707/99936Y10S707/99942
    • A method of quickly and automatically comparing a new document to a large number of previously seen documents and identifying the document type. First, provide a plurality of document type distributions, each document type distribution describes layout characteristics of an independent document type and may include a plurality of data points. Each document type distribution includes data derived from at least one basis document signature which may include data defining pixels of a low-resolution image of the independent basis document resolved to between 1 and 75 dots per inch or may include document segmentation data derived from the independent basis document. Next provide a new electronic document. Then create new document signature from the new electronic document. Next, distances between the new document signature and each of the plurality of document type distributions are calculated using an algorithm based on a Bayesian framework for a Gaussian distribution. The distances calculated may be Euclidean distances or may be Mahalanobis distances. Additionally, calculating the distances may include weighting the value given each of a plurality of data points in the document signatures based on the usefulness of each of the plurality of data points in distinguishing between the document signatures. Next, select at least one candidate document type for the new electronic document from among the independent document types described by the plurality of document type distributions. The selection of the at least one candidate document type may include selecting a preselected fixed number of the independent document types or may include selecting the independent document types described by those of the plurality of document type distributions having calculated distances that are within a preselected threshold distance of the smallest of the distances calculated. In addition, the invention provides for a program storage medium readable by computer, tangibly embodying a program of instructions executable by the computer to perform the method steps described above.