Please use this identifier to cite or link to this item:
Type: Thesis
Title: Efficient Fully Convolutional Networks for Dense Prediction Tasks
Author: Liu, Yifan
Issue Date: 2021
School/Discipline: School of Computer Science
Abstract: Dense prediction is a family of fundamental problems in computer vision, which learns a mapping from input images to complex output structures, including semantic segmentation, depth estimation, and object detection, among many others. Pixel-level labeling is required in such tasks. Deep neural networks have been the dominant solution since the invention of fully-convolutional neural networks (FCNs). Well-designed complicated network structures achieve state-of-the-art performance on benchmark datasets, but often with a high computational cost. The cost will be more expensive when extending to the video sequence. It is important to design efficient fully convolutional networks for dense prediction tasks so that the models can be used on mobile devices in many real-world applications. Light-weight models have drawn much attention recently. Most compact models try to obtain higher accuracy with lower computational cost, but usually, they need to make the trade-off between accuracy and efficiency. Besides, it is hard to train a compact model properly with limited model capacity. Thus, we target improving the performance of fully convolutional networks by using extra constraints during the training process to keep the efficiency of the inference. Our study starts with knowledge distillation, which has been verified valid in classification tasks. The compact models are trained with the help of large models. We design several new distillation methods for capturing the structure information, taking into account the fact that dense prediction is a structured prediction problem. Moreover, we extend the distillation methods to the video sequence and design temporal knowledge distillation. Both the temporal consistency and the accuracy of the compact models can be improved. Except for knowledge distillation, we employ auxiliary modules to provide extra gradients or supervisions in training compact models. Through our training methods, we can improve the performance of compact models without any extra computational costs during inference. The proposed training methods are general and can be applied to various network structures, datasets, and tasks. We mainly conduct our experiments on typical dense prediction tasks, e.g., semantic segmentation with both images and video sequences. We also extend our methods to object detection, depth estimation, and the multi-task learning system. We outperform previous works with a better trade-off between accuracy and efficiency for various dense prediction tasks.
Advisor: Shen, Chunhua
Liao, Zhibin
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2021
Keywords: Dense prediction
knowledge distillation
efficient models
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at:
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
LiuY2021_PhD.pdf13.42 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.