专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

WO2021058270A1 GATED ATTENTION NEURAL NETWORKS 审中-公开
公开(公告)号：WO2021058270A1
公开(公告)日：2021-04-01
申请号：PCT/EP2020/074913
申请日：2020-09-07
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： PARISOTTO, Emilio , SONG, Hasuk , RAE, Jack William , JAYAKUMAR, Siddhant Madhu , JADERBERG, Maxwell Elliot , PASCANU, Razvan , GULCEHRE, Caglar
IPC分类号： G06N3/04
摘要： A system including an attention neural network that is configured to receive an input sequence and to process the input sequence to generate an output is described. The attention neural network includes: an attention block configured to receive a query input, a key input, and a value input that are derived from an attention block input. The attention block includes an attention neural network layer configured to: receive an attention layer input derived from the query input, the key input, and the value input, and apply an attention mechanism to the query input, the key input, and the value input to generate an attention layer output for the attention neural network layer; and a gating neural network layer configured to apply a gating mechanism to the attention block input and the attention layer output of the attention neural network layer to generate a gated attention output.

2. 发明申请

WO2019229125A1 DEEP REINFORCEMENT LEARNING WITH FAST UPDATING RECURRENT NEURAL NETWORKS AND SLOW UPDATING RECURRENT NEURAL NETWORKS 审中-公开
公开(公告)号：WO2019229125A1
公开(公告)日：2019-12-05
申请号：PCT/EP2019/063970
申请日：2019-05-29
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： DUNNING, Iain Robert , CZARNECKI, Wojciech , JADERBERG, Maxwell Elliot
IPC分类号： G06N3/00 , G06N3/04 , G06N3/08 , G06N3/12
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. One of the methods includes selecting an action to be performed by the agent using both a slow updating recurrent neural network and a fast updating recurrent neural network that receives a fast updating input that includes the hidden state of the slow updating recurrent neural network.

3. 发明申请

WO2022248734A1 ENHANCING POPULATION-BASED TRAINING OF NEURAL NETWORKS 审中-公开
公开(公告)号：WO2022248734A1
公开(公告)日：2022-12-01
申请号：PCT/EP2022/064563
申请日：2022-05-30
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： DALIBARD, Valentin Clement , JADERBERG, Maxwell Elliot
IPC分类号： G06N3/00 , G06N3/04 , G06N3/08
摘要： Methods, computer systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network for performing a task. The system maintains data specifying (i) a plurality of candidate neural networks and (ii) a partitioning of the plurality of candidate neural networks into a plurality of partitions. The system repeatedly performs operations, including: training each of the candidate neural networks; evaluating each candidate neural network using a respective fitness function for the partition; and for each partition, updating the respective values of the one or more hyperparameters for at least one of the candidate neural networks in the partition based on the respective fitness metrics of the candidate neural networks in the partition. After repeatedly performing the operations, the system selects, from the maintained data, the respective values of the network parameters of one of the candidate neural networks.

4. 发明申请

WO2020152364A1 MULTI-AGENT REINFORCEMENT LEARNING WITH MATCHMAKING POLICIES 审中-公开
公开(公告)号：WO2020152364A1
公开(公告)日：2020-07-30
申请号：PCT/EP2020/051839
申请日：2020-01-24
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： SILVER, David , VINYALS, Oriol , JADERBERG, Maxwell Elliot
IPC分类号： G06N3/00 , G06N3/04 , G06N3/08
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a policy neural network having a plurality of policy parameters and used to select actions to be performed by an agent to control the agent to perform a particular task while interacting with one or more other agents in an environment. In one aspect, the method includes: maintaining data specifying a pool of candidate action selection policies; maintaining data specifying respective matchmaking policy; and training the policy neural network using a reinforcement learning technique to update the policy parameters. The policy parameters define policies to be used in controlling the agent to perform the particular task.

5. 发明申请

WO2023006848A1 TRAINING AGENT NEURAL NETWORKS THROUGH OPEN-ENDED LEARNING 审中-公开
公开(公告)号：WO2023006848A1
公开(公告)日：2023-02-02
申请号：PCT/EP2022/071137
申请日：2022-07-27
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： JADERBERG, Maxwell Elliot , CZARNECKI, Wojciech
IPC分类号： G06N3/00 , G06N3/04 , G06N3/08
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for raining an agent neural network for use in controlling an agent to perform a plurality of tasks. One of the methods includes maintaining population data specifying a population of one or more candidate agent neural networks; and training each candidate agent neural network on a respective set of one or more tasks to update the parameter values of the parameters of the candidate agent neural networks in the population data, the training comprising, for each candidate agent neural network: obtaining data identifying a candidate task; obtaining data specifying a control policy for the candidate task; determining whether to train the candidate agent neural network on the candidate task; and in response to determining to train the candidate agent neural network on the candidate task, training the candidate agent neural network on the candidate task.

6. 发明申请

WO2019101836A1 POPULATION BASED TRAINING OF NEURAL NETWORKS 审中-公开
公开(公告)号：WO2019101836A1
公开(公告)日：2019-05-31
申请号：PCT/EP2018/082162
申请日：2018-11-22
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： JADERBERG, Maxwell Elliot , CZARNECKI, Wojciech , GREEN, Timothy Frederick Goldie , DALIBARD, Valentin Clement
IPC分类号： G06N3/08 , G06N3/12 , G06N5/00
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a neural network. A method includes: training a neural network having a plurality of network parameters to perform a particular neural network task and to determine trained values of the network parameters using an iterative training process having a plurality of hyperparameters, the method comprising: maintaining a plurality of candidate neural networks and, for each of the candidate neural networks, data specifying: (i) respective values of the network parameters for the candidate neural network, (ii) respective values of the hyperparameters for the candidate neural network, and (iii) a quality measure that measures a performance of the candidate neural network on the particular neural network task; and for each of the plurality of candidate neural networks, repeatedly performing additional training operations.

7. 发明申请

WO2018083671A1 REINFORCEMENT LEARNING WITH AUXILIARY TASKS 审中-公开
标题翻译：辅助任务加强学习
公开(公告)号：WO2018083671A1
公开(公告)日：2018-05-11
申请号：PCT/IB2017/056906
申请日：2017-11-04
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： MNIH, Volodymyr , CZARNECKI, Wojciech , JADERBERG, Maxwell Elliot , SCHAUL, Tom , SILVER, David , KAVUKCUOGLU, Koray
IPC分类号： G06N3/04 , G06N3/08
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.
摘要翻译：包括编码在计算机存储介质上的用于训练强化学习系统的计算机程序的方法，系统和装置。该方法包括：训练动作选择策略神经网络，并且在训练动作选择神经网络期间，训练一个或多个辅助控制神经网络和奖励预测神经网络。每个辅助控制神经网络被配置为接收由动作选择策略神经网络生成的相应中间输出并且为相应的辅助控制任务生成策略输出。奖励预测神经网络被配置为接收由动作选择策略神经网络生成的一个或多个中间输出并且生成对应的预测奖励。训练每个辅助控制神经网络和奖励预测神经网络包括调整各个辅助控制参数，奖励预测参数和动作选择策略网络参数的值。

8. 发明公开

EP4357976A1 DEEP REINFORCEMENT LEARNING WITH FAST UPDATING RECURRENT NEURAL NETWORKS AND SLOW UPDATING RECURRENT NEURAL 审中-公开
公开(公告)号：EP4357976A1
公开(公告)日：2024-04-24
申请号：EP23206214.1
申请日：2019-05-29
申请人： DeepMind Technologies Limited
发明人： DUNNING, Iain Robert , CZARNECKI, Wojciech , JADERBERG, Maxwell Elliot
IPC分类号： G06N3/045 , G06N3/044 , G06N3/092
CPC分类号： G06N3/044 , G06N3/045 , G06N3/092
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for reinforcement learning. One of the methods includes selecting an action to be performed by the agent using both a slow updating recurrent neural network and a fast updating recurrent neural network that receives a fast updating input that includes the hidden state of the slow updating recurrent neural network.

9. 发明公开

EP4007975A1 GATED ATTENTION NEURAL NETWORKS 审中-实审
公开(公告)号：EP4007975A1
公开(公告)日：2022-06-08
申请号：EP20768551.2
申请日：2020-09-07
申请人： DeepMind Technologies Limited
发明人： PARISOTTO, Emilio , SONG, Hasuk , RAE, Jack William , JAYAKUMAR, Siddhant Madhu , JADERBERG, Maxwell Elliot , PASCANU, Razvan , GULCEHRE, Caglar
IPC分类号： G06N3/04

10. 发明公开

EP3899797A1 MULTI-AGENT REINFORCEMENT LEARNING WITH MATCHMAKING POLICIES 审中-实审
公开(公告)号：EP3899797A1
公开(公告)日：2021-10-27
申请号：EP20702116.3
申请日：2020-01-24
申请人： DeepMind Technologies Limited
发明人： SILVER, David , VINYALS, Oriol , JADERBERG, Maxwell Elliot
IPC分类号： G06N3/00 , G06N3/04 , G06N3/08

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式