Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control

Tang, D.; Chen, L.; Tian, Z.; Hu, E.

doi:10.1080/00207179.2019.1648874

Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control

Files

hdl_124816.pdf (3 MB)

(Submitted version)

Date

2021

Authors

Tang, D.

Chen, L.

Tian, Z.

Hu, E.

Type:

Journal article

Citation

International Journal of Control, 2021; 94(5):1321-1333

Statement of Responsibility

Difan Tang, Lei Chen, Zhao Feng Tian and Eric Hu

DOI

10.1080/00207179.2019.1648874

Abstract

This study proposes a modiﬁed value-function-approximation (MVFA) and investi-gates its use under a single-critic conﬁguration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-aﬃne continuous-time nonlinear systems. Exist-ing single-critic algorithms require stabilising critic tuning laws while eliminating actor tuning. This paper thus studies alternative single-critic realisation aiming to relax the needs for stabilising mechanisms in the critic tuning law. Optimal control laws are determined from the Hamilton-Jacobi-Bellman equality by solving for the associated value function via SPI in a single-critic conﬁguration. Diﬀerent from other existing single-critic methods, an MVFA is proposed to deal with closed-loop stabil-ity during online learning. Gradient-descent tuning is employed to adjust the critic NN parameters in the interests of not complicating the problem. Parameters conver-gence and closed-loop stability are examined. The proposed MVFA-based approach yields an alternative single-critic SPI method with uniformly ultimately bounded closed-loop stability during online learning without the need for stabilising mecha-nisms in the critic tuning law. The proposed approach is veriﬁed via simulations.

Description

Published online: 11 Aug 2019.

Rights

Published Version

https://www.tandfonline.com/

Persistent link to this record

http://hdl.handle.net/2440/124816

Full item page

Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control

Files

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

DOI

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record