Shi, Javen QinfengAbbasnejad, EhsanSchool of Computer and Mathematical SciencesCao, Haiyao2025-06-162025-06-162025https://hdl.handle.net/2440/145212In financial trading, where stock price fluctuations and the positions held in assets directly influence profits and losses, the complexities of ever-changing market dynamics and the entangled latent variables behind the market compound the challenge of profit-making. Therefore, there is an intense desire for algorithms capable of navigating these shifting markets and disentangling the latent variables within the market. Our initial research focused on analyzing shifts in market distributions, leading to the development of InvariantStock, a rule-based algorithm within a prediction-based learning framework. This algorithm is designed to select features and learn patterns that withstand market shifts. However, its inherent rigidity and reliance on invariant features limit its ability to maximize profitability. To address this limitation and disentangle the latent variables, we transitioned to a Reinforcement Learning (RL)-based approach, appropriate for the Partially Observable Markov Decision Process (POMDP) nature of financial trading and the underlying dynamics in the latent space behind the market. We developed a novel theory that relaxes stringent previous assumptions, such as the need for invertible mappings from latent variables to observations and the division of the latent space into independent subsets. Our approach ensures that preserving transitions and rewards is sufficient to disentangle the underlying states (content) from the noisy style variables in general POMDP problems. Building on this theoretical foundation, we created a world model integrating the disentanglement techniques with RL. This model constantly disentangles content and style and optimizes decision-making through a policy network. We initially tested this model with distractors in DeepMind Control (DMC) tasks. It proved effective in traditional RL scenarios by separating content variables (the robots’ states) from style variables, often exhibiting spurious correlations. Extending this theory and algorithm to financial trading, we introduced a causal graph separating the underlying dynamics into market and portfolio dynamics. By adhering to transition and reward preservation constraints, our model effectively distinguishes content variables—direct influencers of stock price changes—from style variables of the financial market. We have adapted this robust world model to the stock market, enabling an RL agent to make optimal trading decisions by accurately identifying influential content variables and disregarding irrelevant style variables.enReinforcement learningCausalityFinancial tradingThe Role of Invariant Feature Selection and Causal Reinforcement Learning in Developing Robust Financial Trading AlgorithmsThesis