会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 7. 发明申请
    • REINFORCEMENT LEARNING WITH AUXILIARY TASKS
    • 辅助任务加强学习
    • WO2018083671A1
    • 2018-05-11
    • PCT/IB2017/056906
    • 2017-11-04
    • DEEPMIND TECHNOLOGIES LIMITED
    • MNIH, VolodymyrCZARNECKI, WojciechJADERBERG, Maxwell ElliotSCHAUL, TomSILVER, DavidKAVUKCUOGLU, Koray
    • G06N3/04G06N3/08
    • Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.
    • 包括编码在计算机存储介质上的用于训练强化学习系统的计算机程序的方法,系统和装置。 该方法包括:训练动作选择策略神经网络,并且在训练动作选择神经网络期间,训练一个或多个辅助控制神经网络和奖励预测神经网络。 每个辅助控制神经网络被配置为接收由动作选择策略神经网络生成的相应中间输出并且为相应的辅助控制任务生成策略输出。 奖励预测神经网络被配置为接收由动作选择策略神经网络生成的一个或多个中间输出并且生成对应的预测奖励。 训练每个辅助控制神经网络和奖励预测神经网络包括调整各个辅助控制参数,奖励预测参数和动作选择策略网络参数的值。