Efficient Fully Convolutional Networks for Dense Prediction Tasks

Liu, Yifan

Efficient Fully Convolutional Networks for Dense Prediction Tasks

Files

LiuY2021_PhD.pdf (13.11 MB)

Date

2021

Authors

Liu, Yifan

Advisors

Shen, Chunhua
Liao, Zhibin

Type:

Thesis

Abstract

Dense prediction is a family of fundamental problems in computer vision, which learns a mapping from input images to complex output structures, including semantic segmentation, depth estimation, and object detection, among many others. Pixel-level labeling is required in such tasks. Deep neural networks have been the dominant solution since the invention of fully-convolutional neural networks (FCNs). Well-designed complicated network structures achieve state-of-the-art performance on benchmark datasets, but often with a high computational cost. The cost will be more expensive when extending to the video sequence. It is important to design efficient fully convolutional networks for dense prediction tasks so that the models can be used on mobile devices in many real-world applications. Light-weight models have drawn much attention recently. Most compact models try to obtain higher accuracy with lower computational cost, but usually, they need to make the trade-off between accuracy and efficiency. Besides, it is hard to train a compact model properly with limited model capacity. Thus, we target improving the performance of fully convolutional networks by using extra constraints during the training process to keep the efficiency of the inference. Our study starts with knowledge distillation, which has been verified valid in classification tasks. The compact models are trained with the help of large models. We design several new distillation methods for capturing the structure information, taking into account the fact that dense prediction is a structured prediction problem. Moreover, we extend the distillation methods to the video sequence and design temporal knowledge distillation. Both the temporal consistency and the accuracy of the compact models can be improved. Except for knowledge distillation, we employ auxiliary modules to provide extra gradients or supervisions in training compact models. Through our training methods, we can improve the performance of compact models without any extra computational costs during inference. The proposed training methods are general and can be applied to various network structures, datasets, and tasks. We mainly conduct our experiments on typical dense prediction tasks, e.g., semantic segmentation with both images and video sequences. We also extend our methods to object detection, depth estimation, and the multi-task learning system. We outperform previous works with a better trade-off between accuracy and efficiency for various dense prediction tasks.

School/Discipline

School of Computer Science

Dissertation Note

Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2021

Provenance

This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals

Persistent link to this record

https://hdl.handle.net/2440/134023

Full item page

Efficient Fully Convolutional Networks for Dense Prediction Tasks

Files

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record