Online Transfer-enabled Temporal Difference Learning for Discrete-time Markov Jump Systems
Date
2026
Authors
Xue, H.
Wen, J.
Shi, P.
Luan, X.
Editors
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Journal article
Citation
IEEE Transactions on Automation Science and Engineering, 2026; 23:1315-1326
Statement of Responsibility
Huiwen Xue, Jiwei Wen, Peng Shi, and Xiaoli Luan
Conference Name
Abstract
This paper investigates the robust control problem of Markov jump linear systems (MJLSs) with unknown transition probabilities (TPs). While existing temporal difference learning (TDL) methods eliminate the requirement for the precise value of TPs, they often overlook the rapidity of the method convergence. Therefore, we propose an online transfer-enabled temporal difference learning (TTDL) method that explores prior knowledge from similar yet different systems to accelerate convergence and improve the estimation accuracy of decision matrices. Specifically, a transfer estimator is constructed by combining control parameters from the source domain with mode trajectories from the target domain to approximate the target decision matrices. At the beginning of each learning episode, this estimator is incorporated into the value function through an adaptive transfer mechanism. The mechanism uses source knowledge only when reliable and suppresses it near convergence, effectively avoiding negative transfer and yielding rapid policy updates. The theoretical analysis provides a rigorous proof of the convergence for the value function in the online TTDL method. Comparative experiments validate its effectiveness and highlight its reliability under data-scarce scenarios. An application on an aero-engine system further validates the practical applicability and efficiency of the proposed method. Note to Practitioners—Despite the continuous update of industrial system setups, most existing methods overlook historical data reuse, causing slow convergence and poor robustness. To address these challenges, we propose an online transfer-enabled temporal difference learning (TTDL) method for Markov jump linear systems that reuses prior knowledge from a similar yet different system. This method achieves fast convergence and high estimation accuracy, making it a promising solution for data-scarce scenarios. Simulation results, including an aero-engine application, validate its efficiency and effectiveness. Practitioners should note that the efficacy of this transfer method hinges on similarity prerequisites: the source and target systems must share identical mode sets, system dimensions, and control problem formulations, along with similar system parameters, transition probabilities, and performance indices. Implementation requires the source domain’s decision matrices and control policies, along with target domain mode-jump trajectories. Computationally, the complexity is O(N⋅n2x⋅T⋅Nt) , where N is the total number of modes, nx is the state variable dimension, and T×Nt represents the total data length.
School/Discipline
Dissertation Note
Provenance
Description
Access Status
Rights
© 2025 IEEE. All rights reserved, including rights for text and data mining, and training of artificial intelligence and similar technologies. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.