会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明申请
    • REINFORCEMENT LEARNING SYSTEMS
    • 加强学习系统
    • WO2018083667A1
    • 2018-05-11
    • PCT/IB2017/056902
    • 2017-11-04
    • DEEPMIND TECHNOLOGIES LIMITED
    • SILVER, DavidSCHAUL, TomHESSEL, MatteoVAN HASSELT, Hado Philip
    • G06N3/04G06N3/08
    • Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for prediction of an outcome related to an environment. In one aspect, a system comprises a state representation neural network that is configured to: receive an observation characterizing a state of an environment being interacted with by an agent and process the observation to generate an internal state representation of the environment state; a prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a predicted subsequent state representation of a subsequent state of the environment and a predicted reward for the subsequent state; and a value prediction neural network that is configured to receive a current internal state representation of a current environment state and process the current internal state representation to generate a value prediction.
    • 包括编码在计算机存储介质上的计算机程序的方法,系统和装置用于预测与环境有关的结果。 在一个方面中,一种系统包括状态表示神经网络,所述状态表示神经网络被配置为:接收表征代理与之交互的环境的状态的观察结果,并处理所述观察结果以生成所述环境状态的内部状态表示; 预测神经网络,其被配置为接收当前环境状态的当前内部状态表示并且处理所述当前内部状态表示以生成所述环境的后续状态的预测后续状态表示和所述后续状态的预测奖励; 以及值预测神经网络,其被配置为接收当前环境状态的当前内部状态表示并且处理当前内部状态表示以生成值预测。
    • 5. 发明申请
    • REINFORCEMENT LEARNING WITH AUXILIARY TASKS
    • 辅助任务加强学习
    • WO2018083671A1
    • 2018-05-11
    • PCT/IB2017/056906
    • 2017-11-04
    • DEEPMIND TECHNOLOGIES LIMITED
    • MNIH, VolodymyrCZARNECKI, WojciechJADERBERG, Maxwell ElliotSCHAUL, TomSILVER, DavidKAVUKCUOGLU, Koray
    • G06N3/04G06N3/08
    • Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. The method includes: training an action selection policy neural network, and during the training of the action selection neural network, training one or more auxiliary control neural networks and a reward prediction neural network. Each of the auxiliary control neural networks is configured to receive a respective intermediate output generated by the action selection policy neural network and generate a policy output for a corresponding auxiliary control task. The reward prediction neural network is configured to receive one or more intermediate outputs generated by the action selection policy neural network and generate a corresponding predicted reward. Training each of the auxiliary control neural networks and the reward prediction neural network comprises adjusting values of the respective auxiliary control parameters, reward prediction parameters, and the action selection policy network parameters.
    • 包括编码在计算机存储介质上的用于训练强化学习系统的计算机程序的方法,系统和装置。 该方法包括:训练动作选择策略神经网络,并且在训练动作选择神经网络期间,训练一个或多个辅助控制神经网络和奖励预测神经网络。 每个辅助控制神经网络被配置为接收由动作选择策略神经网络生成的相应中间输出并且为相应的辅助控制任务生成策略输出。 奖励预测神经网络被配置为接收由动作选择策略神经网络生成的一个或多个中间输出并且生成对应的预测奖励。 训练每个辅助控制神经网络和奖励预测神经网络包括调整各个辅助控制参数,奖励预测参数和动作选择策略网络参数的值。