会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 10. 发明授权
    • Method for determining the resemining the resemblance of documents
    • 确定文件相似性的方法
    • US06230155B1
    • 2001-05-08
    • US09197928
    • 1998-11-23
    • Andrei Zary BroderCharles Gregory Nelson
    • Andrei Zary BroderCharles Gregory Nelson
    • G06F1730
    • G06F17/3071Y10S707/99931Y10S707/99933Y10S707/99952Y10S707/99953
    • A method for facilitating the comparison of two computerized documents. The method includes loading a first document into a random access memory (RAM), loading a second document into the RAM, reducing the first document into a first sequence of tokens, reducing the second document into a second sequence of tokens, converting the first set of tokens to a first (multi)set of shingles, converting the second set of tokens to a second (multi)set of shingles, determining a first sketch of the first (multi)set of shingles, determining a second sketch of the second (multi)set of shingles, and comparing the first sketch and the second sketch. The sketches have a fixed size, independent of the size of the documents. The resemblance of two documents is provided using a sketch of each document. The sketches may be computed fairly fast and given two sketches the resemblance of the corresponding documents can be computed in linear time in the size of the sketches.
    • 一种便于比较两个电脑化文件的方法。 该方法包括将第一文档加载到随机存取存储器(RAM)中,将第二文档加载到RAM中,将第一文档减少为第一序列令牌,将第二文档减少为第二序列序列,转换第一集合 将令牌转换成第一(多个)带状疱疹组,将第二组令牌转换成第二组(多组)带状疱疹,确定所述第一(多)组带状疱疹的第一草图,确定所述第二(多) 多个)组合,并且比较第一个草图和第二个草图。 草图具有固定的大小,与文档的大小无关。 使用每个文档的草图提供两个文档的相似性。 草图可以相当快速地计算,并给出两个草图,相应文档的相似性可以在草图大小的线性时间内计算。