会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 31. 发明申请
    • PREPROCESSING OF TEXT
    • 文本预处理
    • US20120179453A1
    • 2012-07-12
    • US12987469
    • 2011-01-10
    • Rayid GhaniChad CumbyMarko Krema
    • Rayid GhaniChad CumbyMarko Krema
    • G06F17/27
    • G06F17/27
    • Performance of statistical machine learning techniques, particularly classification techniques applied to the extraction of attributes and values concerning products, is improved by preprocessing a body of text to be analyzed to remove extraneous information. The body of text is split into a plurality of segments. In an embodiment, sentence identification criteria are applied to identify sentences as the plurality of segments. Thereafter, the plurality of segments are clustered to provide a plurality of clusters. One or more of the resulting clusters are then analyzed to identify segments having low relevance to their respective clusters. Such low relevance segments are then removed from their respective clusters and, consequently, from the body of text. As the resulting relevance-filtered body of text no longer includes portions of the body of text containing mostly extraneous information, the reliability of any subsequent statistical machine learning techniques may be improved.
    • 统计机器学习技术的性能,特别是应用于提取属性和产品价值的分类技术,通过预处理要分析的文本体以消除无关信息得到改进。 文本正文分为多个段。 在一个实施例中,应用句子识别标准来识别句子作为多个段。 此后,将多个段聚类以提供多个簇。 然后分析所得到的一个或多个聚类,以识别与其相应聚类具有低相关性的片段。 然后从相应的集群中移除这样低的相关性段,并因此从文本正文中移除。 由于结果相关性过滤的文本体不再包含主要包含无关信息的文本主体部分,因此可以提高任何后续统计机器学习技术的可靠性。
    • 32. 发明申请
    • EXTRACTION OF ATTRIBUTES AND VALUES FROM NATURAL LANGUAGE DOCUMENTS
    • 从自然语言文件中提取属性和价值
    • US20120036100A1
    • 2012-02-09
    • US13197906
    • 2011-08-04
    • Katharina ProbstRayid GhaniAndrew E. FanoMarko KremaYan Liu
    • Katharina ProbstRayid GhaniAndrew E. FanoMarko KremaYan Liu
    • G06N5/02
    • G06F17/27G06F17/2745
    • One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.
    • 一个或多个分类算法被应用于至少一个自然语言文档,以便提取给定产品的属性和值。 为此,可以采用监督分类算法,半监督分类算法,无监督分类算法或这种分类算法的组合。 可以经由公共通信网络获得至少一个自然语言文档。 如此识别的两个或多个属性(或两个或多个值)可以被合并以形成一个或多个属性短语或值短语。 一旦以这种方式提取了属性和值,就可以执行关联或链接操作来建立描述产品的属性值对。 在当前优选的实施例中,(无监督)算法用于生成种子属性和值,然后可以支持受监督或半监督分类算法。
    • 33. 发明授权
    • Automated classification algorithm comprising at least one input-invariant part
    • 包括至少一个输入不变部分的自动分类算法
    • US08027941B2
    • 2011-09-27
    • US11855493
    • 2007-09-14
    • Katharina ProbstRayid Ghani
    • Katharina ProbstRayid Ghani
    • G06F15/18G06E1/00
    • G06N99/005
    • A classification algorithm is separated into one or more input-invariant parts and one or more input-dependent classification parts. The input-invariant parts of the classification algorithm capture the underlying and unchanging relationships between the plurality of data elements being operated upon by the classification algorithm, whereas the one or more classification parts embody the probabilistic labeling of the data elements according to the various classifications. For any given iteration, a user's input is used to modify at least one classification part of the algorithm. Recalculated classification parts (i.e., updated classification results) are determined based on computationally simple combinations of the one or more modified classification parts and the one or more input-invariant parts. Preferably, a graphical user interface is used to solicit user input. In this manner, wait times between user feedback iterations can be dramatically reduced, thereby making application of active learning to classification tasks a practical reality.
    • 分类算法被分成一个或多个输入不变部分和一个或多个输入相关分类部分。 分类算法的输入不变部分捕获由分类算法操作的多个数据元素之间的底层和不变的关系,而一个或多个分类部分根据各种分类体现数据元素的概率标签。 对于任何给定的迭代,用户的输入用于修改算法的至少一个分类部分。 基于一个或多个修改的分类部分和一个或多个输入不变部分的计算上简单的组合来确定重新计算的分类部分(即更新的分类结果)。 优选地,使用图形用户界面来征求用户输入。 以这种方式,可以显着减少用户反馈迭代之间的等待时间,从而将主动学习应用于分类任务成为现实。
    • 34. 发明授权
    • Auction result prediction with auction insurance
    • 拍卖结果预测与拍卖保险
    • US07904378B2
    • 2011-03-08
    • US12816079
    • 2010-06-15
    • Rayid GhaniHillery D. Simmons
    • Rayid GhaniHillery D. Simmons
    • G06Q40/00
    • G06Q30/08G06Q30/0202G06Q40/04G06Q40/08
    • An auction result prediction system predicts auction results. The system may determine item, seller, or auction characteristics from prior or pending auctions. The system also obtains item characteristics of an item for which a result prediction is sought, either by a buyer or by a seller. A price predictor in the system accepts the auction and item characteristics and predicts an auction result based on the characteristics. The system also determines insurance parameters for insuring online auctions, and the insurance parameters may be based on predicted auction results. An insurance policy reflecting the insurance parameters may be offered to an online auction buyer, seller, or other market participant. The insurance policy may insure, for example, that an item for sale will obtain at least a price specified by the insurance policy.
    • 拍卖结果预测系统预测拍卖结果。 系统可以从先前或未决的拍卖确定商品,卖家或拍卖特征。 该系统还可以由买方或卖方获得寻求结果预测的项目的项目特征。 系统中的价格预测器接受拍卖和项目特征,并根据特征预测拍卖结果。 该系统还确定保险参数,以确保在线拍卖,保险参数可以基于预计的拍卖结果。 可以向在线拍卖买方,卖方或其他市场参与者提供反映保险参数的保险单。 保险单可以保险,例如,出售物品至少将获得保险单规定的价格。
    • 35. 发明申请
    • AUCTION RESULT PREDICTION WITH AUCTION INSURANCE
    • 拍卖结果预测与拍卖保险
    • US20100256999A1
    • 2010-10-07
    • US12816079
    • 2010-06-15
    • Rayid GhaniHillery D. Simmons
    • Rayid GhaniHillery D. Simmons
    • G06Q40/00G06Q10/00G06Q30/00
    • G06Q30/08G06Q30/0202G06Q40/04G06Q40/08
    • An auction result prediction system predicts auction results. The system may determine item, seller, or auction characteristics from prior or pending auctions. The system also obtains item characteristics of an item for which a result prediction is sought, either by a buyer or by a seller. A price predictor in the system accepts the auction and item characteristics and predicts an auction result based on the characteristics. The system also determines insurance parameters for insuring online auctions, and the insurance parameters may be based on predicted auction results. An insurance policy reflecting the insurance parameters may be offered to an online auction buyer, seller, or other market participant. The insurance policy may insure, for example, that an item for sale will obtain at least a price specified by the insurance policy.
    • 拍卖结果预测系统预测拍卖结果。 系统可以从先前或未决的拍卖确定商品,卖家或拍卖特征。 该系统还可以由买方或卖方获得寻求结果预测的项目的项目特征。 系统中的价格预测器接受拍卖和项目特征,并根据特征预测拍卖结果。 该系统还确定保险参数,以确保在线拍卖,保险参数可以基于预计的拍卖结果。 可以向在线拍卖买方,卖方或其他市场参与者提供反映保险参数的保险单。 保险单可以保险,例如,出售物品至少将获得保险单规定的价格。
    • 36. 发明申请
    • DATA ANONYMIZATION BASED ON GUESSING ANONYMITY
    • 基于指导性的数据归一化
    • US20100162402A1
    • 2010-06-24
    • US12338483
    • 2008-12-18
    • Yaron RachlinKatherine ProbstRayid Ghani
    • Yaron RachlinKatherine ProbstRayid Ghani
    • G06F21/00
    • G06F21/60G06F21/6254
    • Privacy is defined in the context of a guessing game based on the so-called guessing inequality. The privacy of a sanitized record, i.e., guessing anonymity, is defined by the number of guesses an attacker needs to correctly guess an original record used to generate a sanitized record. Using this definition, optimization problems are formulated that optimize a second anonymization parameter (privacy or data distortion) given constraints on a first anonymization parameter (data distortion or privacy, respectively). Optimization is performed across a spectrum of possible values for at least one noise parameter within a noise model. Noise is then generated based on the noise parameter value(s) and applied to the data, which may comprise real and/or categorical data. Prior to anonymization, the data may have identifiers suppressed, whereas outlier data values in the noise perturbed data may be likewise modified to further ensure privacy.
    • 隐私在基于所谓的猜测不等式的猜测游戏的上下文中被定义。 消毒记录的隐私,即猜测匿名,由攻击者需要正确猜测用于生成消毒记录的原始记录的猜测次数来定义。 使用该定义,给出了优化问题,其优化给定第一匿名参数(分别为数据失真或隐私)的约束的第二匿名参数(隐私或数据失真)。 在噪声模型中的至少一个噪声参数的可能值的频谱范围内执行优化。 然后基于噪声参数值产生并施加到数据的噪声,该数据可以包括实际和/或分类数据。 在匿名化之前,数据可以具有被抑制的标识符,而噪声干扰数据中的异常值数据值可以被修改以进一步确保隐私。
    • 38. 发明申请
    • EXTRACTION OF ATTRIBUTES AND VALUES FROM NATURAL LANGUAGE DOCUMENTS
    • 从自然语言文件中提取属性和价值
    • US20070282872A1
    • 2007-12-06
    • US11742244
    • 2007-04-30
    • Katharina ProbstRayid GhaniAndrew E. FanoMarko KremaYan Liu
    • Katharina ProbstRayid GhaniAndrew E. FanoMarko KremaYan Liu
    • G06F17/00
    • G06F17/2715G06F17/241G06F17/30616
    • One or more classification algorithms are applied to at least one natural language document in order to extract both attributes and values of a given product. Supervised classification algorithms, semi-supervised classification algorithms, unsupervised classification algorithms or combinations of such classification algorithms may be employed for this purpose. The at least one natural language document may be obtained via a public communication network. Two or more attributes (or two or more values) thus identified may be merged to form one or more attribute phrases or value phrases. Once attributes and values have been extracted in this manner, association or linking operations may be performed to establish attribute-value pairs that are descriptive of the product. In a presently preferred embodiment, an (unsupervised) algorithm is used to generate seed attributes and values which can then support a supervised or semi-supervised classification algorithm.
    • 一个或多个分类算法被应用于至少一个自然语言文档,以便提取给定产品的属性和值。 为此,可以采用监督分类算法,半监督分类算法,无监督分类算法或这种分类算法的组合。 可以经由公共通信网络获得至少一个自然语言文档。 如此识别的两个或多个属性(或两个或多个值)可以被合并以形成一个或多个属性短语或值短语。 一旦以这种方式提取了属性和值,就可以执行关联或链接操作来建立描述产品的属性值对。 在当前优选的实施例中,(无监督)算法用于生成种子属性和值,然后可以支持受监督或半监督分类算法。
    • 39. 发明授权
    • Sentiment classifiers based on feature extraction
    • 基于特征提取的情感分类器
    • US08676730B2
    • 2014-03-18
    • US13179707
    • 2011-07-11
    • Rayid GhaniMarko Krema
    • Rayid GhaniMarko Krema
    • G06F15/18
    • G06N99/005
    • Method and apparatus are provided for providing one or more sentiment classifiers from training data using supervised classification techniques based on features extracted from the training data. Training data includes a plurality of units such as, but not limited to, documents, paragraphs, sentences, and clauses. A feature extraction component extracts a plurality of features from the training data, and a feature value determination component determines a value for each extracted feature based on a frequency at which each feature occurs in the training data. On the other hand, a class labeling component labels each unit of the training data according to a plurality of sentiment classes to provide labeled training data. Thereafter, a sentiment classifier generation component provides a least one sentiment classifier based on the value of each extracted feature and the labeled training data using a supervised classification technique.
    • 提供方法和装置,用于基于从训练数据提取的特征,使用监督分类技术从训练数据提供一个或多个情绪分类器。 训练数据包括多个单位,例如但不限于文件,段落,句子和子句。 特征提取组件从训练数据中提取多个特征,并且特征值确定组件基于每个特征在训练数据中出现的频率来确定每个提取的特征的值。 另一方面,类别标注组件根据多个情绪类标示训练数据的每个单元,以提供标记的训练数据。 此后,情绪分类器生成组件使用监督分类技术,基于每个提取的特征的值和标记的训练数据提供至少一个情感分类器。
    • 40. 发明授权
    • Entity assessment and ranking
    • 实体评估和排名
    • US08639682B2
    • 2014-01-28
    • US12344738
    • 2008-12-29
    • Chad Michael CumbyKatharina ProbstRayid Ghani
    • Chad Michael CumbyKatharina ProbstRayid Ghani
    • G06F7/00
    • G06F17/3053G06F17/30672G06F17/30687
    • General entity retrieval and ranking is described. A first set of documents is retrieved from one or more document repositories based on a query formed according to the topic. The first set of documents is characterized based on its first set of metadata values. One or more candidate entities are identified based on the first set of documents and the original query is thereafter augmented according to a candidate entity. The second set of documents resulting from the augmented query is then characterized in a similar manner. For each candidate entity, the first and second document set characterizations are compared to determine their degree of similarity. Increasingly similar document set characterizations indicates that the candidate entity is increasingly relevant to the original query. Repeating this process for each of the one or more candidate entities can give rise to rankings according to the respective degrees of similarity.
    • 描述一般实体检索和排名。 基于根据该主题形成的查询,从一个或多个文档库中检索第一组文档。 第一组文档基于其第一组元数据值来表征。 基于第一组文档识别一个或多个候选实体,然后根据候选实体增加原始查询。 然后以类似的方式表征由增强查询产生的第二组文档。 对于每个候选实体,比较第一和第二文档集表征以确定它们的相似程度。 越来越相似的文件集表征表明候选实体越来越与原始查询相关。 对于一个或多个候选实体中的每一个重复该过程可以根据相应的相似度产生排名。