Towards Robust Deep Neural Networks

Doan, Gia Bao

Towards Robust Deep Neural Networks

Files

Doan2022_PhD.pdf (15.36 MB)

Date

2022

Authors

Doan, Gia Bao

Advisors

Ranasinghe, Damith Chinthana
Abbasnejad, Ehsan

Type:

Thesis

Abstract

Deep neural networks (DNNs) enable state-of-the-art performance for most machine learning tasks. Unfortunately, they are vulnerable to attacks, such as Trojans during training and Adversarial Examples at test time. Adversarial Examples are inputs with carefully crafted perturbations added to benign samples. In the Computer Vision domain, while the perturbations being imperceptible to humans, Adversarial Examples can successfully misguide or fool DNNs. Meanwhile, Trojan or backdoor attacks involve attackers tampering with the training process, for example, to inject poisoned training data to embed a backdoor into the network that can be activated during model deployment when the Trojan triggers (known only to the attackers) appear in the model’s inputs. This dissertation investigates methods of building robust DNNs against these training-time and test-time threats. Recognising the threat of Adversarial Examples in the malware domain, this research considers the problem of realising a robust DNN-based malware detector against Adversarial Example attacks by developing a Bayesian adversarial learning algorithm. In contrast to vision tasks, adversarial learning in a domain without a differentiable or invertible mapping function from the problemspace (such as software code inputs) to the feature space is hard. The study proposes an alternative; performing adversarial learning in the feature space and proving the projection of perturbed yet, valid malware, in the problem space into the feature space will be a subset of feature-space adversarial attacks. The Bayesian approach improves benign performance, provably bounds the difference between adversarial risk and empirical risk and improves robustness against increasingly large attack budgets not employed during training. To investigate the problem of improving the robustness of DNNs against Adversarial Examples–carefully crafted perturbation added to inputs—in the Computer Vision domain, the research considers the problem of developing a Bayesian learning algorithm to realise a robust DNN against Adversarial Examples in the CV domain. Accordingly, a novel Bayesian learning method is designed that conceptualises an information gain objective to measure and force the information learned from both benign and Adversarial Examples to be similar. This method proves that minimising this information gain objective further tightens the bound of the difference between adversarial risk and empirical risk to move towards a basis for a principled method of adversarially training BNNs. Recognising the threat from backdoor or Trojan attacks against DNNs, the research considers the problem of finding a robust defence method that is effective against Trojan attacks. The research explores a new idea in the domain; sanitisation of inputs and proposes Februus to neutralise highly potent and insidious Trojan attacks on DNN systems at run-time. In Trojan attacks, an adversary activates a backdoor crafted in a deep neural network model using a secret trigger, a Trojan, applied to any input to alter the model’s decision to a target prediction—a target determined by and only known to the attacker. Februus sanitises the incoming input by surgically removing the potential trigger artifacts and restoring the input for the classification task. Februus enables effective Trojan mitigation by sanitising inputs with no loss of performance for sanitised inputs, trojaned or benign. This method is highly effective at defending against advanced Trojan attack variants as well as challenging, adaptive attacks where attackers have full knowledge of the defence method. Investigating the connections between Trojan attacks and spatially constrained Adversarial Examples or so-called Adversarial Patches in the input space, the research exposes an emerging threat; an attack exploiting the vulnerability of a DNN to generate naturalistic adversarial patches as universal triggers. For the first time, a method based on Generative Adversarial Networks is developed to exploit a GAN’s latent space to search for universal naturalistic adversarial patches. The proposed attack’s advantage is its ability to exert a high level of control, enabling attackers to craft naturalistic adversarial patches that are highly effective, robust against state-of-the-art DNNs, and deployable in the physical world without needing to interfere with the model building process or risking discovery. Until now, this has only been demonstrably possible using Trojan attack methods.

School/Discipline

School of Computer Science

Dissertation Note

Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2022

Provenance

This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals

Persistent link to this record

https://hdl.handle.net/2440/136603

Full item page

Towards Robust Deep Neural Networks

Files

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record