Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control
Files
(Submitted version)
Date
2021
Authors
Tang, D.
Chen, L.
Tian, Z.
Hu, E.
Editors
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Journal article
Citation
International Journal of Control, 2021; 94(5):1321-1333
Statement of Responsibility
Difan Tang, Lei Chen, Zhao Feng Tian and Eric Hu
Conference Name
Abstract
This study proposes a modified value-function-approximation (MVFA) and investi-gates its use under a single-critic configuration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-affine continuous-time nonlinear systems. Exist-ing single-critic algorithms require stabilising critic tuning laws while eliminating actor tuning. This paper thus studies alternative single-critic realisation aiming to relax the needs for stabilising mechanisms in the critic tuning law. Optimal control laws are determined from the Hamilton-Jacobi-Bellman equality by solving for the associated value function via SPI in a single-critic configuration. Different from other existing single-critic methods, an MVFA is proposed to deal with closed-loop stabil-ity during online learning. Gradient-descent tuning is employed to adjust the critic NN parameters in the interests of not complicating the problem. Parameters conver-gence and closed-loop stability are examined. The proposed MVFA-based approach yields an alternative single-critic SPI method with uniformly ultimately bounded closed-loop stability during online learning without the need for stabilising mecha-nisms in the critic tuning law. The proposed approach is verified via simulations.
School/Discipline
Dissertation Note
Provenance
Description
Published online: 11 Aug 2019.
Access Status
Rights
© 2019 Informa UK Limited, trading as Taylor & Francis Group