The Role of Invariant Feature Selection and Causal Reinforcement Learning in Developing Robust Financial Trading Algorithms

Date

2025

Authors

Cao, Haiyao

Editors

Advisors

Shi, Javen Qinfeng
Abbasnejad, Ehsan
School of Computer and Mathematical Sciences

Journal Title

Journal ISSN

Volume Title

Type:

Thesis

Citation

Statement of Responsibility

Conference Name

Abstract

In financial trading, where stock price fluctuations and the positions held in assets directly influence profits and losses, the complexities of ever-changing market dynamics and the entangled latent variables behind the market compound the challenge of profit-making. Therefore, there is an intense desire for algorithms capable of navigating these shifting markets and disentangling the latent variables within the market. Our initial research focused on analyzing shifts in market distributions, leading to the development of InvariantStock, a rule-based algorithm within a prediction-based learning framework. This algorithm is designed to select features and learn patterns that withstand market shifts. However, its inherent rigidity and reliance on invariant features limit its ability to maximize profitability. To address this limitation and disentangle the latent variables, we transitioned to a Reinforcement Learning (RL)-based approach, appropriate for the Partially Observable Markov Decision Process (POMDP) nature of financial trading and the underlying dynamics in the latent space behind the market. We developed a novel theory that relaxes stringent previous assumptions, such as the need for invertible mappings from latent variables to observations and the division of the latent space into independent subsets. Our approach ensures that preserving transitions and rewards is sufficient to disentangle the underlying states (content) from the noisy style variables in general POMDP problems. Building on this theoretical foundation, we created a world model integrating the disentanglement techniques with RL. This model constantly disentangles content and style and optimizes decision-making through a policy network. We initially tested this model with distractors in DeepMind Control (DMC) tasks. It proved effective in traditional RL scenarios by separating content variables (the robots’ states) from style variables, often exhibiting spurious correlations. Extending this theory and algorithm to financial trading, we introduced a causal graph separating the underlying dynamics into market and portfolio dynamics. By adhering to transition and reward preservation constraints, our model effectively distinguishes content variables—direct influencers of stock price changes—from style variables of the financial market. We have adapted this robust world model to the stock market, enabling an RL agent to make optimal trading decisions by accurately identifying influential content variables and disregarding irrelevant style variables.

School/Discipline

Dissertation Note

Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 2025

Provenance

This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record