Towards Robust Deep Neural Networks
Date
2022
Authors
Doan, Gia Bao
Editors
Advisors
Ranasinghe, Damith Chinthana
Abbasnejad, Ehsan
Abbasnejad, Ehsan
Journal Title
Journal ISSN
Volume Title
Type:
Thesis
Citation
Statement of Responsibility
Conference Name
Abstract
Deep neural networks (DNNs) enable state-of-the-art performance for most machine
learning tasks. Unfortunately, they are vulnerable to attacks, such as Trojans during
training and Adversarial Examples at test time. Adversarial Examples are inputs
with carefully crafted perturbations added to benign samples. In the Computer
Vision domain, while the perturbations being imperceptible to humans, Adversarial
Examples can successfully misguide or fool DNNs. Meanwhile, Trojan or backdoor
attacks involve attackers tampering with the training process, for example, to inject
poisoned training data to embed a backdoor into the network that can be activated
during model deployment when the Trojan triggers (known only to the attackers)
appear in the model’s inputs. This dissertation investigates methods of building robust
DNNs against these training-time and test-time threats.
Recognising the threat of Adversarial Examples in the malware domain, this research
considers the problem of realising a robust DNN-based malware detector against Adversarial
Example attacks by developing a Bayesian adversarial learning algorithm. In contrast
to vision tasks, adversarial learning in a domain without a differentiable or invertible
mapping function from the problemspace (such as software code inputs) to the feature
space is hard. The study proposes an alternative; performing adversarial learning in
the feature space and proving the projection of perturbed yet, valid malware, in the
problem space into the feature space will be a subset of feature-space adversarial
attacks. The Bayesian approach improves benign performance, provably bounds
the difference between adversarial risk and empirical risk and improves robustness
against increasingly large attack budgets not employed during training.
To investigate the problem of improving the robustness of DNNs against Adversarial
Examples–carefully crafted perturbation added to inputs—in the Computer Vision
domain, the research considers the problem of developing a Bayesian learning algorithm to
realise a robust DNN against Adversarial Examples in the CV domain. Accordingly, a novel
Bayesian learning method is designed that conceptualises an information gain objective
to measure and force the information learned from both benign and Adversarial
Examples to be similar. This method proves that minimising this information gain
objective further tightens the bound of the difference between adversarial risk and empirical risk to move towards a basis for a principled method of adversarially training
BNNs.
Recognising the threat from backdoor or Trojan attacks against DNNs, the research
considers the problem of finding a robust defence method that is effective against Trojan
attacks. The research explores a new idea in the domain; sanitisation of inputs and
proposes Februus to neutralise highly potent and insidious Trojan attacks on DNN
systems at run-time. In Trojan attacks, an adversary activates a backdoor crafted in
a deep neural network model using a secret trigger, a Trojan, applied to any input
to alter the model’s decision to a target prediction—a target determined by and only
known to the attacker. Februus sanitises the incoming input by surgically removing the
potential trigger artifacts and restoring the input for the classification task. Februus
enables effective Trojan mitigation by sanitising inputs with no loss of performance
for sanitised inputs, trojaned or benign. This method is highly effective at defending
against advanced Trojan attack variants as well as challenging, adaptive attacks where
attackers have full knowledge of the defence method.
Investigating the connections between Trojan attacks and spatially constrained
Adversarial Examples or so-called Adversarial Patches in the input space, the research
exposes an emerging threat; an attack exploiting the vulnerability of a DNN to generate
naturalistic adversarial patches as universal triggers. For the first time, a method based
on Generative Adversarial Networks is developed to exploit a GAN’s latent space to
search for universal naturalistic adversarial patches. The proposed attack’s advantage
is its ability to exert a high level of control, enabling attackers to craft naturalistic
adversarial patches that are highly effective, robust against state-of-the-art DNNs, and
deployable in the physical world without needing to interfere with the model building
process or risking discovery. Until now, this has only been demonstrably possible
using Trojan attack methods.
School/Discipline
School of Computer Science
Dissertation Note
Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2022
Provenance
This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals