会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • System and method for context-based document retrieval
    • 基于上下文的文档检索的系统和方法
    • US06633868B1
    • 2003-10-14
    • US09627617
    • 2000-07-28
    • Shermann Loyall MinConstantin Lorenzo TannoZachary Frank MainenWilliam Russell Softky
    • Shermann Loyall MinConstantin Lorenzo TannoZachary Frank MainenWilliam Russell Softky
    • G06F1730
    • G06F17/30864Y10S707/99933Y10S707/99934
    • A system and method for document retrieval is disclosed. The invention addresses a major problem in text-based document retrieval: rapidly finding a small subset of documents in a large document collection (e.g. Web pages on the Internet) that are relevant to a limited set of query terms supplied by the user. The invention is based on utilizing information contained in the document collection about the statistics of word relationships (“context”) to facilitate the specification of search queries and document comparison. The method consists of first compiling word relationships into a context database that captures the statistics of word proximity and occurrence throughout the document collection. At retrieval time, a search matrix is computed from a set of user-supplied keywords and the context database. For each document in the collection, a similar matrix is computed using the contents of the document and the context database. Document relevance is determined by comparing the similarity of the search and document matrices. The disclosed system therefore retrieves documents with contextual similarity rather than word frequency similarity, simplifying search specification while allowing greater search precision.
    • 公开了一种用于文件检索的系统和方法。 本发明解决了基于文本的文档检索中的主要问题:快速地找到与用户提供的有限的一组查询词相关的大型文档集合(例如因特网中的网页)的一小部分文档。 本发明基于利用关于词关系(“上下文”)的统计的文档集合中包含的信息,以便于搜索查询和文档比较的规范。 该方法包括首先将字词关系编译成上下文数据库,以捕获整个文档集合中字近似和出现的统计信息。 在检索时间,从一组用户提供的关键字和上下文数据库中计算搜索矩阵。 对于集合中的每个文档,使用文档和上下文数据库的内容计算类似的矩阵。 通过比较搜索和文档矩阵的相似度来确定文档相关性。 因此,所公开的系统检索具有上下文相似性而不是字频相似性的文档,从而简化搜索规范,同时允许更大的搜索精度。