Theses

Permanent URI for this community

https://hdl.handle.net/2440/14759

Included in the Research Theses collection are Masters, PHD and Professional Doctorate theses.
Masters and PHDs by coursework theses are included in the Honours and Coursework Theses collection. Some schools may choose to include Honours theses which make a significant contribution to knowledge.
If a school submits undergraduate theses or final year projects to the Digital Library, these will be included with the school's other collections.

News

For further information about the collection, please contact the Library Theses team

Browse

Now showing 1 - 6 of 6

Open Access
Domain Generalisation in Reinforcement Learning
(2025) Orenstein, Adrian; Reid, Ian; Abbasnejad, Ehsan; School of Computer and Mathematical Sciences
Deep reinforcement learning (RL) aims to learn a general policy for the agent to act in an environment. The policy learns state representations from observations using deep neural networks. As such, deep RL inherently suffers from their defects. For instance, neural networks suffer from shortcut learning in which models latch on to rudimentary input patterns leading to a lack of generalisation. Common approaches to remedy their problems are to (1) obtain larger datasets for better data coverage, (2) train neural networks with more parameters, or (3) include more input variations either by simply adding them to the training set (i.e. data augmentation) or by additionally regularising the objective the information on these variations (i.e. domain information in domain generalisation approaches). However, while these approaches are investigated in supervised learning, their effectiveness remains underexplored in RL. In this thesis, we investigate methods of improving the generalisation capability of on-policy RL agents. We conduct our investigation on a relatively new procedural benchmark named ProcGen (Cobbe et al., 2020), where for a particular game, various levels are procedurally generated - each receiving their own domain label - giving us an ideal platform to investigate domain generalisation methods developed for supervised learning and their efficacy in on-policy RL. In our investigation we find that utilising domain information does indeed result in an improvement in generalisation performance. We apply a supervised learning method, AND-mask (Parascandolo et al., 2021), to a PPO (Schulman et al., 2017) agent and find that when the agent does not get many variations of the same environment to learn from, AND-mask effectively regularises the learned representations and enables the agent to generalise better in novel domains. We also identify a limitation of ANDmask that limits scalability when learning from more training domains. This investigation results in agents that can generalise more effectively when the diversity of training domains are limited, which is beneficial when gathering more diverse data in the real world is costly. Lastly, we explore how model scale, the number of training samples, and the diversity of these training samples contribute to generalisation performance. Our investigation focuses on the onpolicy case in RL where data gathered by the policy is more difficult for deep neural networks to learn from as the data is highly correlated with the policy. In other words, the data gathered by the policy is not independant and identically distributed (i.i.d.), which is often an assumption required for generalisation. Furthermore, as the agent learns and its behaviour is continually changing, the dataset the agent gathers in its experience replay is non-stationary. Given these two complications of non-i.i.d data, and non-stationarity, we observe results that larger models - using a single backbone to extract features - are more effective at generalisation than smaller networks that are decoupled, regardless of the number of domains provided during training.
Open Access
Interactive Vision and Language Learning
(2022) Parvaneh, Amin; Shi, Javen Qinfeng; Abbasnejad, Ehsan; School of Computer Science
Effective and efficient interactions with humans in real environments is an appealing though challenging task for an artificial agent. Despite recent advances in deep learning, especially in the branch of vision and language learning, there are still unsolved issues in the way of reaching such an ambitious agent. Three critical aspects of the interactions between human and machine via natural language (e.g. to create intelligent assistants) are: (1) for the model to understand and anticipate human intents to consistently participate in conversations, (2) to learn from a small set of instances and seek information the model needs to accurately achieve its goals and (3) to generalise with those small number of observations obtained under the supervision of humans so that the agent can be practically used. As for human intent perception, we propose an inclusive model for the visual negotiation task, where the intelligent agent needs to anticipate human intent while communicating via natural language. Our model exploits online resources in search of similar items for the estimation of a fair agreement price humans might set as their goals. Considering the estimated agreement price of the advertised item as well as its visual and textual features (i.e. images and textual descriptions), we build competitive and consistent language and price generation policies that negotiate significantly better than other baselines. For the information-seeking aspect, we propose an effective active learning (AL) method that facilitates learning with less labelled data by seeking a small subset of unlabelled instances that, when labelled and used for the model training, the highest test accuracy can be achieved. We propose efficient interpolations in the feature space between unlabelled and labelled samples to identify unlabelled instances that have inconsistent class predictions in their neighbourhood. After requesting labels of the selected subset from the human expert, we achieve the highest performance boost in the retrained model in comparison to other AL methods. More specifically, our method achieves remarkable results in the low-data regimes on high-dimensional data, where the performances of other AL methods are unsatisfactory. Finally, regarding the generalisation, we equipped the agent with the capability of reasoning about counterfactual scenarios, which discourages the model’s propensity for focusing on spurious features or memorising seen environments. For that, we let the model to intervene in the visual and textual features of the input in a causal model and create counterfactual samples that together with the real observations are used for the training of the model. Hence, the trained model is more resilient to the effect of spurious features and biases in the data and better generalises to unseen situations. Additionally, to increase the generalisation to unseen environments in more interactive applications, we propose a novel approach to generate counterfactual environments and enforce the agent to learn from both the observations and actions in those counterfactual environments. After formalising the supervised and reinforcement learning objectives to include both real and counterfactual environments, our trained agent generalises significantly better than other baselines to unseen environments in two challenging vision-and-language navigation tasks.
Open Access
The Role of Invariant Feature Selection and Causal Reinforcement Learning in Developing Robust Financial Trading Algorithms
(2025) Cao, Haiyao; Shi, Javen Qinfeng; Abbasnejad, Ehsan; School of Computer and Mathematical Sciences
In financial trading, where stock price fluctuations and the positions held in assets directly influence profits and losses, the complexities of ever-changing market dynamics and the entangled latent variables behind the market compound the challenge of profit-making. Therefore, there is an intense desire for algorithms capable of navigating these shifting markets and disentangling the latent variables within the market. Our initial research focused on analyzing shifts in market distributions, leading to the development of InvariantStock, a rule-based algorithm within a prediction-based learning framework. This algorithm is designed to select features and learn patterns that withstand market shifts. However, its inherent rigidity and reliance on invariant features limit its ability to maximize profitability. To address this limitation and disentangle the latent variables, we transitioned to a Reinforcement Learning (RL)-based approach, appropriate for the Partially Observable Markov Decision Process (POMDP) nature of financial trading and the underlying dynamics in the latent space behind the market. We developed a novel theory that relaxes stringent previous assumptions, such as the need for invertible mappings from latent variables to observations and the division of the latent space into independent subsets. Our approach ensures that preserving transitions and rewards is sufficient to disentangle the underlying states (content) from the noisy style variables in general POMDP problems. Building on this theoretical foundation, we created a world model integrating the disentanglement techniques with RL. This model constantly disentangles content and style and optimizes decision-making through a policy network. We initially tested this model with distractors in DeepMind Control (DMC) tasks. It proved effective in traditional RL scenarios by separating content variables (the robots’ states) from style variables, often exhibiting spurious correlations. Extending this theory and algorithm to financial trading, we introduced a causal graph separating the underlying dynamics into market and portfolio dynamics. By adhering to transition and reward preservation constraints, our model effectively distinguishes content variables—direct influencers of stock price changes—from style variables of the financial market. We have adapted this robust world model to the stock market, enabling an RL agent to make optimal trading decisions by accurately identifying influential content variables and disregarding irrelevant style variables.
Open Access
Towards Better Efficiency and Generalization in Imitation Learning: A Causal Perspective
(2024) Jabri, Mohamed Khalil; Shi, Javen Qinfeng; Abbasnejad, Ehsan; School of Computer and Mathematical Sciences
Imitation Learning, also known as Learning from Demonstrations, has emerged as a practical alternative to reinforcement learning, mitigating the intricate challenges associated with reward engineering in the latter. However, imitation learning agents often face limitations that hinder their effectiveness in realistic scenarios. Sample efficiency and generalization pose notable challenges among these limitations. Concurrently, there has been an increasing acknowledgment of causality’s significance in improving learning-based approaches, resulting in its recent prominence within the machine learning community. This dissertation explores the potential of causality-inspired approaches to address the aforementioned challenges in imitation learning through two distinct contributions. The first contribution introduces a novel method to enhance goal-conditioned imitation learning using Structural Causal Models (SCMs) and counterfactual data. We leverage SCMs as a formalism to understand the inherent causal relationships between different variables governing expert behavior. This enables the generation of counterfactual data, which we utilize to learn improved reward functions with reduced data, thereby enhancing the agent’s efficiency. The second contribution focuses on identifying causal features that remain consistent across different environments. Unlike many works on domain generalization, our method is equally applicable to Reinforcement Learning and Imitation Learning, eschewing the need for domain supervision and remaining agnostic to data modality, rendering it broadly applicable. Through empirical evaluation, this dissertation establishes the efficacy of causalityinspired approaches in advancing imitation learning capabilities. The proposed methodologies not only contribute to overcoming some fundamental limitations of existing imitation learning algorithms but also provide valuable insights into the broader application of causality in machine learning.
Open Access
Towards Robust Deep Neural Networks
(2022) Doan, Gia Bao; Ranasinghe, Damith Chinthana; Abbasnejad, Ehsan; School of Computer Science
Deep neural networks (DNNs) enable state-of-the-art performance for most machine learning tasks. Unfortunately, they are vulnerable to attacks, such as Trojans during training and Adversarial Examples at test time. Adversarial Examples are inputs with carefully crafted perturbations added to benign samples. In the Computer Vision domain, while the perturbations being imperceptible to humans, Adversarial Examples can successfully misguide or fool DNNs. Meanwhile, Trojan or backdoor attacks involve attackers tampering with the training process, for example, to inject poisoned training data to embed a backdoor into the network that can be activated during model deployment when the Trojan triggers (known only to the attackers) appear in the model’s inputs. This dissertation investigates methods of building robust DNNs against these training-time and test-time threats. Recognising the threat of Adversarial Examples in the malware domain, this research considers the problem of realising a robust DNN-based malware detector against Adversarial Example attacks by developing a Bayesian adversarial learning algorithm. In contrast to vision tasks, adversarial learning in a domain without a differentiable or invertible mapping function from the problemspace (such as software code inputs) to the feature space is hard. The study proposes an alternative; performing adversarial learning in the feature space and proving the projection of perturbed yet, valid malware, in the problem space into the feature space will be a subset of feature-space adversarial attacks. The Bayesian approach improves benign performance, provably bounds the difference between adversarial risk and empirical risk and improves robustness against increasingly large attack budgets not employed during training. To investigate the problem of improving the robustness of DNNs against Adversarial Examples–carefully crafted perturbation added to inputs—in the Computer Vision domain, the research considers the problem of developing a Bayesian learning algorithm to realise a robust DNN against Adversarial Examples in the CV domain. Accordingly, a novel Bayesian learning method is designed that conceptualises an information gain objective to measure and force the information learned from both benign and Adversarial Examples to be similar. This method proves that minimising this information gain objective further tightens the bound of the difference between adversarial risk and empirical risk to move towards a basis for a principled method of adversarially training BNNs. Recognising the threat from backdoor or Trojan attacks against DNNs, the research considers the problem of finding a robust defence method that is effective against Trojan attacks. The research explores a new idea in the domain; sanitisation of inputs and proposes Februus to neutralise highly potent and insidious Trojan attacks on DNN systems at run-time. In Trojan attacks, an adversary activates a backdoor crafted in a deep neural network model using a secret trigger, a Trojan, applied to any input to alter the model’s decision to a target prediction—a target determined by and only known to the attacker. Februus sanitises the incoming input by surgically removing the potential trigger artifacts and restoring the input for the classification task. Februus enables effective Trojan mitigation by sanitising inputs with no loss of performance for sanitised inputs, trojaned or benign. This method is highly effective at defending against advanced Trojan attack variants as well as challenging, adaptive attacks where attackers have full knowledge of the defence method. Investigating the connections between Trojan attacks and spatially constrained Adversarial Examples or so-called Adversarial Patches in the input space, the research exposes an emerging threat; an attack exploiting the vulnerability of a DNN to generate naturalistic adversarial patches as universal triggers. For the first time, a method based on Generative Adversarial Networks is developed to exploit a GAN’s latent space to search for universal naturalistic adversarial patches. The proposed attack’s advantage is its ability to exert a high level of control, enabling attackers to craft naturalistic adversarial patches that are highly effective, robust against state-of-the-art DNNs, and deployable in the physical world without needing to interfere with the model building process or risking discovery. Until now, this has only been demonstrably possible using Trojan attack methods.
Open Access
Towards Robust Deep Neural Networks: Query Efficient Black-Box Adversarial Attacks and Defences
(2023) Vo, Quoc Viet; Ranasinghe, Damith Chinthana; Abbasnejad, Ehsan; School of Computer and Mathematical Sciences
Deep neural networks (DNNs) have been recognized for their remarkable ability to achieve state-of-the-art performance across numerous machine learning tasks. However, DNN models are susceptible to attacks in the deployment phase, where Adversarial Examples (AEs) present significant threats. Generally, in the Computer Vision domain, adversarial examples are maliciously modified inputs that look similar to the original input and are constructed under white-box settings by adversaries with full knowledge and access to a victim model. But, recent studies have shown the ability to extract information solely from the output of a machine learning model to craft adversarial perturbations to black-box models is a practical threat against real-world systems. This is significant because of the growing numbers of Machine Learning as a Service (MLaaS) providers—including Google, Microsoft, IBM—and applications incorporating these models. Therefore, this dissertation studies the weaknesses of DNNs to attacks in black-box settings and seeks to develop mechanisms that can defend DNNs against these attacks. Recognising the practical ability of adversaries to exploit simply the classification decision (predicted label) from a trained model’s access interface distinguished as a decision-based attack, the research in Chapter 3 first delves into recent state-of-the-art decision-based attacks employing approximate gradient estimation or random search methods. These attacks aim at discovering lp>0 constraint adversarial instances, dubbed dense attacks. The research then develops a robust class of query efficient attacks capable of avoiding entrapment in a local minimum and misdirection from noisy gradients seen in gradient estimation methods. The proposed attack method—RAMBOATTACK—exploits the notion of Randomized Block Coordinate Descent to explore the hidden classifier manifold, targeting perturbations to manipulate only localized input features to address the entrapment issues in local minima encountered by gradient estimation methods. In contrast to dense attacks, recent studies have realised lp=0 constraint adversarial instances, dubbed sparse attacks in white-box settings. This demonstrates that machine learning models are more vulnerable than we believe. However, these sparse attacks in the most challenging scenario—decision-based—have not been well studied. Furthermore, the sparse attacks aim to minimize the number of perturbed pixels—measured by l0 norm—leads to i) an NP-hard problem; and ii) a non-differentiable search space. Recognizing the shortage of study about sparse attacks in a decision-based setting and challenges of NP-hard problem and non-differential search space, the research in Chapter 4 explores decision-based spare attacks and develops an evolution-based algorithm—SPARSEEVO—for handling these challenges. The results of comprehensive experiments in this research show that SPARSEEVO requires significantly fewer model queries than the state-of-the-art sparse attack for both untargeted and targeted attacks. Importantly, the query efficient SPARSEEVO, along with decision-based attacks, in general, raise new questions regarding the safety of deployed systems and poses new directions to study and understand the robustness of machine learning models. Extracting information solely from the confidence score of a machine learning model can considerably reduce the required query budgets to attack a victim model. But similar to sparse attacks in decision-based settings, constructing sparse adversarial attacks, even when models opt to serve confidence score information to queries, is non-trivial because of the resulting NP-hard problem and the non-differentiable search space. To this end, the study in Chapter 5 develops the BRUSLEATTACK—a new algorithm built upon a Bayesian framework for the problem and evaluates against Convolutional Neural Networks, Vision Transformers, recent Stylized ImageNet models, defense methods and Machine Learning as a Service (MLaaS) offerings exemplified by Google Cloud Vision. Through extensive experiments, the proposed attack achieves state-of-the-art attack success rates and query efficiency on standard computer vision tasks across various models. Understanding and recognizing the vulnerability of Deep Learning models to adversarial attacks in various black-box scenarios has compelled the exploration of mechanisms to defend Deep Learning models. Therefore, the research in Chapter 6 explores different defense approaches and proposes a more effective mechanism to defend against black-box attacks. Particularly, the research aims to integrate uncertainty into model outputs to mislead black-box attacks by randomly selecting a single or a subset of well-trained models to make predictions to query inputs. The uncertainty in the output scores to sequences of queries is able to hamper the attempt of attack algorithms at estimating gradients or searching directions toward an adversarial example. Since the uncertainty in the output scores can be improved through the diversity of a model set, the research investigates different techniques to promote model diversity. Through comprehensive experiments, the research demonstrates that the Stein Variational Gradient Descent method with a novel sample loss objective encourages greater diversity than others. Overall, both introducing uncertainty into the output scores and prompting diversity of the model set studied in this research is able to greatly enhance the defense capability against black-box attacks with minimal impact on model performance.

Browse

Browsing Theses by Advisors "Abbasnejad, Ehsan"

Results Per Page

Sort Options