Modified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control

dc.contributor.authorTang, D.
dc.contributor.authorChen, L.
dc.contributor.authorTian, Z.
dc.contributor.authorHu, E.
dc.date.issued2021
dc.descriptionPublished online: 11 Aug 2019.
dc.description.abstractThis study proposes a modified value-function-approximation (MVFA) and investi-gates its use under a single-critic configuration based on neural networks (NNs) for synchronous policy iteration (SPI) to deliver compact implementation of optimal control online synthesis for control-affine continuous-time nonlinear systems. Exist-ing single-critic algorithms require stabilising critic tuning laws while eliminating actor tuning. This paper thus studies alternative single-critic realisation aiming to relax the needs for stabilising mechanisms in the critic tuning law. Optimal control laws are determined from the Hamilton-Jacobi-Bellman equality by solving for the associated value function via SPI in a single-critic configuration. Different from other existing single-critic methods, an MVFA is proposed to deal with closed-loop stabil-ity during online learning. Gradient-descent tuning is employed to adjust the critic NN parameters in the interests of not complicating the problem. Parameters conver-gence and closed-loop stability are examined. The proposed MVFA-based approach yields an alternative single-critic SPI method with uniformly ultimately bounded closed-loop stability during online learning without the need for stabilising mecha-nisms in the critic tuning law. The proposed approach is verified via simulations.
dc.description.statementofresponsibilityDifan Tang, Lei Chen, Zhao Feng Tian and Eric Hu
dc.identifier.citationInternational Journal of Control, 2021; 94(5):1321-1333
dc.identifier.doi10.1080/00207179.2019.1648874
dc.identifier.issn0020-7179
dc.identifier.issn1366-5820
dc.identifier.orcidTang, D. [0000-0002-7143-0441]
dc.identifier.orcidChen, L. [0000-0002-2269-2912]
dc.identifier.orcidTian, Z. [0000-0001-9847-6004]
dc.identifier.orcidHu, E. [0000-0002-7390-0961]
dc.identifier.urihttp://hdl.handle.net/2440/124816
dc.language.isoen
dc.publisherTaylor & Francis
dc.rights© 2019 Informa UK Limited, trading as Taylor & Francis Group
dc.source.urihttps://www.tandfonline.com/
dc.subjectAdaptive dynamic programming; approximate dynamic programming; neural networks; nonlinear control; optimal control; policy iteration
dc.titleModified value-function-approximation for synchronous policy iteration with single-critic configuration for nonlinear optimal control
dc.typeJournal article
pubs.publication-statusPublished

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
hdl_124816.pdf
Size:
3 MB
Format:
Adobe Portable Document Format
Description:
Submitted version