专利快速检索-快速检索全球专利，免费商用专利数据库-IPRDB

1. 发明申请

WO2019081705A1 USING HIERARCHICAL REPRESENTATIONS FOR NEURAL NETWORK ARCHITECTURE SEARCHING 审中-公开
公开(公告)号：WO2019081705A1
公开(公告)日：2019-05-02
申请号：PCT/EP2018/079401
申请日：2018-10-26
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： FERNANDO, Chrisantha Thomas , SIMONYAN, Karen , KAVUKCUOGLU, Koray , LIU, Hanxiao , VINYALS, Oriol
IPC分类号： G06N3/08 , G06N3/04
CPC分类号： G06N3/086 , G06N3/0454
摘要： A computer-implemented method for automatically determining a neural network architecture represents a neural network architecture as a data structure defining a hierarchical set of directed acyclic graphs in multiple levels. Each graph has an input, an output, and a plurality of nodes between the input and the output. At each level, a corresponding set of the nodes are connected pairwise by directed edges which indicate operations performed on outputs of one node to generate an input to another node. Each level is associated with a corresponding set of operations. At a lowest level, the operations associated with each edge are selected from a set of primitive operations. The method includes repeatedly generating new sample neural network architectures, and evaluating their fitness. The modification is performed by selecting a level, selecting two nodes at that level, and modifying, removing or adding an edge between those nodes according to operations associated with lower levels of the hierarchy.

2. 发明申请

WO2018224690A1 GENERATING DISCRETE LATENT REPRESENTATIONS OF INPUT DATA ITEMS 审中-公开
公开(公告)号：WO2018224690A1
公开(公告)日：2018-12-13
申请号：PCT/EP2018/065308
申请日：2018-06-11
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： KAVUKCUOGLU, Koray , VAN DEN OORD, Aaron Gerard Antonius , VINYALS, Oriol
IPC分类号： G06N3/04 , G06N3/08
CPC分类号： G06N3/0454 , G06N3/0472 , G06N3/084
摘要： Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input data items. One of the methods includes receiving an input data item; providing the input data item as input to an encoder neural network to obtain an encoder output for the input data item; and generating a discrete latent representation of the input data item from the encoder output, comprising: for each of the latent variables, determining, from a set of latent embedding vectors in the memory, a latent embedding vector that is nearest to the encoded vector for the latent variable.

3. 发明申请

WO2018083672A1 ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING 审中-公开
标题翻译：采用强化学习的环境导航
公开(公告)号：WO2018083672A1
公开(公告)日：2018-05-11
申请号：PCT/IB2017/056907
申请日：2017-11-04
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： VIOLA, Fabio , MIROWSKI, Piotr Wojciech , BANINO, Andrea , PASCANU, Razvan , SOYER, Hubert Josef , BALLARD, Andrew James , KUMARAN, Sudarshan , HADSELL, Raia Thais , SIFRE, Laurent , GOROSHIN, Rostislav , KAVUKCUOGLU, Koray , DENIL, Misha Man Ray
IPC分类号： G06N3/04 , G06N3/08
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.
摘要翻译：包括编码在计算机存储介质上的用于训练强化学习系统的计算机程序的方法，系统和装置。在一个方面，一种训练动作选择策略神经网络的方法用于选择要通过在环境中导航以实现一个或多个目标的代理执行的动作，包括：接收表征环境的当前状态的观察图像; 使用动作选择策略神经网络处理包括观察图像的输入以生成动作选择输出; 使用几何预测神经网络处理由动作选择策略神经网络产生的中间输出以预测当处于当前状态时环境的几何特征的值; 以及将基于几何的辅助损失的梯度反向传播到动作选择策略神经网络中以确定针对网络参数的当前值的基于几何的辅助更新。

4. 发明申请

WO2018083532A1 TRAINING ACTION SELECTION NEURAL NETWORKS 审中-公开
标题翻译：培训行动选择的神经网络
公开(公告)号：WO2018083532A1
公开(公告)日：2018-05-11
申请号：PCT/IB2017/001329
申请日：2017-11-03
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： WANG, Ziyu , HEESS, Nicolas, Manfred, Otto , BAPST, Victore , MNIH, Volodymyr , MUNOS, Remi , KAVUKCUOGLU, Koray , DE FREITAS, Joao, Ferdinando, Gomes
IPC分类号： G06N3/08 , G06N3/04
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises : sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.
摘要翻译：包括编码在计算机存储介质上的用于训练动作选择神经网络的计算机程序的方法，系统和装置。其中一种方法包括维护重放存储器，该存储器存储由于代理与环境的交互而产生的轨迹; 以及训练在所述重播存储器中的轨迹上具有策略参数的动作选择神经网络，其中训练所述动作选择神经网络包括：从所述重播存储器中对轨迹进行采样; 以及通过使用关闭策略演员评论者强化学习技术在该轨迹上训练动作选择神经网络来调整策略参数的当前值。

5. 发明申请

WO2019149949A1 DISTRIBUTED TRAINING USING OFF-POLICY ACTOR-CRITIC REINFORCEMENT LEARNING 审中-公开
公开(公告)号：WO2019149949A1
公开(公告)日：2019-08-08
申请号：PCT/EP2019/052692
申请日：2019-02-05
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： SOYER, Hubert Josef , ESPEHOLT, Lasse , SIMONYAN, Karen , DORON, Yotam , FIROIU, Vlad , MNIH, Volodymyr , KAVUKCUOGLU, Koray , MUNOS, Remi , WARD, Thomas , HARLEY, Timothy James Alexander , DUNNING, Iain
IPC分类号： G06N3/04 , G06N3/08
CPC分类号： G06N3/0454 , G06N3/0445 , G06N3/084
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.

6. 发明申请

WO2018153807A1 ACTION SELECTION FOR REINFORCEMENT LEARNING USING NEURAL NETWORKS 审中-公开
公开(公告)号：WO2018153807A1
公开(公告)日：2018-08-30
申请号：PCT/EP2018/054002
申请日：2018-02-19
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： OSINDERO, Simon , KAVUKCUOGLU, Koray , VEZHNEVETS, Alexander
IPC分类号： G06N3/04 , G06N3/08 , G06N3/00
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for a system configured to select actions to be performed by an agent that interacts with an environment. The system comprises a manager neural network subsystem and a worker neural network subsystem. The manager subsystem is configured to, at each of the multiple time steps, generate a final goal vector for the time step. The worker subsystem is configured to, at each of multiple time steps, use the final goal vector generated by the manager subsystem to generate a respective action score for each action in a predetermined set of actions.

7. 发明申请

WO2018153806A1 TRAINING MACHINE LEARNING MODELS 审中-公开
公开(公告)号：WO2018153806A1
公开(公告)日：2018-08-30
申请号：PCT/EP2018/054000
申请日：2018-02-19
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： GENDRON-BELLEMARE, Marc , MENICK, Lee Jacob , GRAVES, Alexander Benjamin , KAVUKCUOGLU, Koray , MUNOS, Remi
IPC分类号： G06N3/08 , G06N3/04
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a machine learning model. In one aspect, a method includes receiving training data for training the machine learning model on a plurality of tasks, where each task includes multiple batches of training data. A task is selected in accordance with a current task selection policy. A batch of training data is selected from the selected task. The machine learning model is trained on the selected batch of training data to determine updated values of the model parameters. A learning progress measure that represents a progress of the training of the machine learning model as a result of training the machine learning model on the selected batch of training data is determined. The current task selection policy is updated using the learning progress measure.

8. 发明申请

WO2018083671A1 REINFORCEMENT LEARNING WITH AUXILIARY TASKS 审中-公开
标题翻译：辅助任务加强学习
公开(公告)号：WO2018083671A1
公开(公告)日：2018-05-11
申请号：PCT/IB2017/056906
申请日：2017-11-04
申请人： DEEPMIND TECHNOLOGIES LIMITED
发明人： MNIH, Volodymyr , CZARNECKI, Wojciech , JADERBERG, Maxwell Elliot , SCHAUL, Tom , SILVER, David , KAVUKCUOGLU, Koray
IPC分类号： G06N3/04 , G06N3/08
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.
摘要翻译：包括编码在计算机存储介质上的用于训练强化学习系统的计算机程序的方法，系统和装置。该方法包括：训练动作选择策略神经网络，并且在训练动作选择神经网络期间，训练一个或多个辅助控制神经网络和奖励预测神经网络。每个辅助控制神经网络被配置为接收由动作选择策略神经网络生成的相应中间输出并且为相应的辅助控制任务生成策略输出。奖励预测神经网络被配置为接收由动作选择策略神经网络生成的一个或多个中间输出并且生成对应的预测奖励。训练每个辅助控制神经网络和奖励预测神经网络包括调整各个辅助控制参数，奖励预测参数和动作选择策略网络参数的值。

9. 发明公开

EP4386624A3 ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING 审中-实审
公开(公告)号：EP4386624A3
公开(公告)日：2024-08-07
申请号：EP24173836.8
申请日：2017-11-04
申请人： DeepMind Technologies Limited
发明人： VIOLA, Fabio , MIROWSKI, Piotr Wojciech , BANINO, Andrea , PASCANU, Razvan , SOYER, Hubert Josef , BALLARD, Andrew James , KUMARAN, Sudarshan , HADSELL, Raia Thais , SIFRE, Laurent , GOROSHIN, Rostislav , KAVUKCUOGLU, Koray , DENIL, Misha Man Ray
IPC分类号： G06N3/006 , G06N3/0442 , G06N3/045 , G06N3/0464 , G06N3/084 , G06N3/092
CPC分类号： G06N3/084 , G06N3/006 , G06N3/045 , G06N3/0464 , G06N3/0442 , G06N3/092
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a loop closure prediction neural network, an intermediate output generated by the action selection policy neural network to predict whether the agent has returned to a location in the environment that the agent has already visited; and backpropagating a gradient of a loop closure based auxiliary loss into the action selection policy neural network to determine a loop closure based auxiliary update for current values of the network parameters.

10. 发明公开

EP3696737A1 TRAINING ACTION SELECTION NEURAL NETWORKS 有权
公开(公告)号：EP3696737A1
公开(公告)日：2020-08-19
申请号：EP20168108.7
申请日：2017-11-03
申请人： Deepmind Technologies Limited
发明人： WANG, Ziyu , HEESS, Nicolas, Manfred, Otto , BAPST, Victore , MNIH, Volodymyr , MUNOS, Remi , KAVUKCUOGLU, Koray , DE FREITAS, Joao, Ferdinando, Gomes
IPC分类号： G06N3/08 , G06N3/04 , G06N3/00 , G06N7/00
摘要： Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network. One of the methods includes maintaining a replay memory that stores trajectories generated as a result of interaction of an agent with an environment; and training an action selection neural network having policy parameters on the trajectories in the replay memory, wherein training the action selection neural network comprises: sampling a trajectory from the replay memory; and adjusting current values of the policy parameters by training the action selection neural network on the trajectory using an off-policy actor critic reinforcement learning technique.

你已经成功收藏专利！

检索式保存成功!

IPRDB

热门服务

关于我们

友情链接

联系方式