Enhancing Model Generalization in Weakly Supervised and Low-Shot Transfer Learning Scenarios

Date

2024

Authors

Chapman, Avraham Nisel

Editors

Advisors

Liu, Lingqiao

Journal Title

Journal ISSN

Volume Title

Type:

Thesis

Citation

Statement of Responsibility

Conference Name

Abstract

Deep neural networks (DNNs) are essential for tasks ranging from handwriting recognition to control systems. They have acted as massive force multipliers of human potential. However, as DNNs have become larger and more complex, the amount of data needed to train them has increased as well. Obtaining such a large amount of training data can be problematic. It can be quite expensive to capture or purchase that much training data. Moreover, labelling that data is even more expensive. Many techniques have been suggested for reducing the amount of data, such as weakly supervised labels or transfer learning from other domains. In this thesis, we explore two methods drawn from these areas for reducing the number of labels required to train DNNs. Weakly supervised labels can be mined from sources such as the internet. These are fairly cheap, but suffer from label noise. To address this, we attempt to increase the tolerance of DNNs to noisy labels through an adversarial training technique that improves model resilience. Our contribution is an adversarial technique that suppresses all features in a DNNs output feature vector that are specific to a particular sample, while preserving the features that are useful to classifying the samples. This leads to a decrease in the required amount of training data for a given accuracy level and an increase in robustness against noisy labels. We also attempt to fine-tune models from other domains on small target sets. We increase the generalizability of models with small target datasets through the introduction of an elegantly simple regularization technique that ensures that the magnitudes of the DNN feature vectors are evenly distributed. This addresses the bias in pretrained models due to the difference between their source and target domains.

School/Discipline

School of Computer and Mathematical Sciences

Dissertation Note

Thesis (MPhil) -- University of Adelaide, School of Computer and Mathematical Sciences, 2024

Provenance

This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record