会员体验
专利管家(专利管理)
工作空间(专利管理)
风险监控(情报监控)
数据分析(专利分析)
侵权分析(诉讼无效)
联系我们
交流群
官方交流:
QQ群: 891211   
微信请扫码    >>>
现在联系顾问~
热词
    • 2. 发明授权
    • Control system and technique employing reinforcement learning having stability and learning phases
    • 使用强化学习的控制系统和技术具有稳定性和学习阶段
    • US06665651B2
    • 2003-12-16
    • US10197731
    • 2002-07-18
    • Peter M. YoungCharles AndersonDouglas C. HittleMatthew Kretchmar
    • Peter M. YoungCharles AndersonDouglas C. HittleMatthew Kretchmar
    • G06F1518
    • G05B13/027
    • A feedback control system for automatic on-line training of a controller for a plant, the system having a reinforcement learning agent connected in parallel with the controller. The learning agent comprises an actor network and a critic network operatively arranged to carry out at least one sequence of a stability phase followed by a learning phase. During the stability phase, a multi-dimensional boundary of values is determined. During the learning phase, a plurality of updated weight values is generated in connection with the on-line training, if and until one of the updated weight values reaches the boundary, at which time a next sequence is carried out to determine a next multi-dimensional boundary of values followed by a next learning phase. Also, a method for automatic on-line training of a feedback controller within a system comprising the controller and a plant by employing a reinforcement learning agent comprising a neural network to carry out at least one sequence comprising a stability phase followed by a learning phase. Further included, a computer executable program code on a computer readable storage medium, for on-line training of a feedback controller within a system comprising the controller and a plant.
    • 一种用于工厂控制器的自动在线训练的反馈控制系统,该系统具有与控制器并联连接的加强学习代理。 学习代理包括行为者网络和评估者网络,操作性地布置成执行稳定阶段的后续的学习阶段的至少一个序列。 在稳定阶段期间,确定值的多维边界。 在学习阶段期间,如果并且直到更新的权重值之一到达边界,则与在线训练相关联地生成多个更新的权重值,此时执行下一个序列以确定下一个多个权重值, 值的三维边界跟随下一个学习阶段。 此外,一种用于在包括控制器和工厂的系统内对反馈控制器进行自动在线训练的方法,所述方法包括使用包括神经网络的加强学习代理来执行至少一个包含稳定阶段和后续学习阶段的序列。 还包括在计算机可读存储介质上的计算机可执行程序代码,用于在包括控制器和工厂的系统内的反馈控制器的在线训练。