Online Transfer-enabled Temporal Difference Learning for Discrete-time Markov Jump Systems

Xue, H.; Wen, J.; Shi, P.; Luan, X.

doi:10.1109/TASE.2025.3641195

Online Transfer-enabled Temporal Difference Learning for Discrete-time Markov Jump Systems

Date

2026

Authors

Xue, H.

Wen, J.

Shi, P.

Luan, X.

Type:

Journal article

Citation

IEEE Transactions on Automation Science and Engineering, 2026; 23:1315-1326

Statement of Responsibility

Huiwen Xue, Jiwei Wen, Peng Shi, and Xiaoli Luan

DOI

10.1109/TASE.2025.3641195

Abstract

This paper investigates the robust control problem of Markov jump linear systems (MJLSs) with unknown transition probabilities (TPs). While existing temporal difference learning (TDL) methods eliminate the requirement for the precise value of TPs, they often overlook the rapidity of the method convergence. Therefore, we propose an online transfer-enabled temporal difference learning (TTDL) method that explores prior knowledge from similar yet different systems to accelerate convergence and improve the estimation accuracy of decision matrices. Specifically, a transfer estimator is constructed by combining control parameters from the source domain with mode trajectories from the target domain to approximate the target decision matrices. At the beginning of each learning episode, this estimator is incorporated into the value function through an adaptive transfer mechanism. The mechanism uses source knowledge only when reliable and suppresses it near convergence, effectively avoiding negative transfer and yielding rapid policy updates. The theoretical analysis provides a rigorous proof of the convergence for the value function in the online TTDL method. Comparative experiments validate its effectiveness and highlight its reliability under data-scarce scenarios. An application on an aero-engine system further validates the practical applicability and efficiency of the proposed method. Note to Practitioners—Despite the continuous update of industrial system setups, most existing methods overlook historical data reuse, causing slow convergence and poor robustness. To address these challenges, we propose an online transfer-enabled temporal difference learning (TTDL) method for Markov jump linear systems that reuses prior knowledge from a similar yet different system. This method achieves fast convergence and high estimation accuracy, making it a promising solution for data-scarce scenarios. Simulation results, including an aero-engine application, validate its efficiency and effectiveness. Practitioners should note that the efficacy of this transfer method hinges on similarity prerequisites: the source and target systems must share identical mode sets, system dimensions, and control problem formulations, along with similar system parameters, transition probabilities, and performance indices. Implementation requires the source domain’s decision matrices and control policies, along with target domain mode-jump trajectories. Computationally, the complexity is O(N⋅n2x⋅T⋅Nt) , where N is the total number of modes, nx is the state variable dimension, and T×Nt represents the total data length.

Rights

© 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.

Grant ID

http://purl.org/au-research/grants/arc/DP240101140

Published Version

https://doi.org/10.1109/tase.2025.3641195

Persistent link to this record

https://hdl.handle.net/2440/149156

Full item page

Online Transfer-enabled Temporal Difference Learning for Discrete-time Markov Jump Systems

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

DOI

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record