会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 1. 发明申请
    • Document Classification Using Multiscale Text Fingerprints
    • 使用多尺度文本指纹的文档分类
    • US20140259157A1
    • 2014-09-11
    • US13790636
    • 2013-03-08
    • Adrian TomaMarius N. Tibeica
    • Adrian TomaMarius N. Tibeica
    • H04L29/06
    • H04L63/1408G06Q50/265H04L51/12
    • Described systems and methods allow a classification of electronic documents such as email messages and HTML documents, according to a document-specific text fingerprint. The text fingerprint is calculated for a text block of each target document, and comprises a sequence of characters determined according to a plurality of text tokens of the respective text block. In some embodiments, the length of the text fingerprint is forced within a pre-determined range of lengths (e.g. between 129 and 256 characters) irrespective of the length of the text block, by zooming in for short text blocks, and zooming out for long ones. Classification may include, for instance, determining whether an electronic document represents unsolicited communication (spam) or online fraud such as phishing.
    • 描述的系统和方法允许根据文档特定的文本指纹对诸如电子邮件消息和HTML文档之类的电子文档进行分类。 针对每个目标文档的文本块计算文本指纹,并且包括根据相应文本块的多个文本令牌确定的字符序列。 在一些实施例中,文本指纹的长度被强制在预定的长度范围内(例如,在129和256个字符之间),而不管文本块的长度,通过放大短文本块并缩小长时间 那些。 例如,分类可能包括确定电子文档是否代表未经请求的通信(垃圾邮件)或网络欺诈,如网络钓鱼。