Please use this identifier to cite or link to this item:
Scopus Web of ScienceĀ® Altmetric
Type: Theses
Title: Deep learning for fine-grained visual recognition
Author: Li, Teng
Issue Date: 2017
School/Discipline: School of Computer Science
Abstract: Fine-grained object recognition is an important task in computer vision. The cross-convolutional-layer pooling method is one of the significant milestones in the development of this field in recent years. Based on the method, we conducted a number of experiments on a new fine-grained car dataset - CompCars. The corresponding experiments illustrate its applicability and effectiveness on this newly- designed dataset. Meanwhile, based on the experiments, we found out that pooling the most distinguishable regions like car logos and headlights areas in the indicator maps, which usually have higher activations, with the local features in the same regions can achieve better results than those by pooling the whole indicator maps with the corresponding local features. Therefore, we conjecture that better performance may be achieved if we have more powerful indicator maps or pooling channels that can better highlight these distinguishable regions. Based on the above hypothesis and inspired by the cross-convolutional-layer pooling, next we propose the Spatially Weighted Pooling (SWP) method, which is a simple yet effective pooling strategy to improve fine-grained classification performance. SWP learns a dozen of pooling channels or spatial encoding masks that aggregate local convolutional feature maps with learned spatial importance information and produce more discriminative features. It can be seamlessly integrated into existing convolutional neural network (CNN) architectures such as the deep residual network. It also allows end-to-end training. SWP has few parameters to learn, usually in several hundreds, therefore does not introduce much computational overhead. SWP has shown significant capability to improve fine-grained visual recognition performance by simply adding it before fully-connected layers in off-the-shelf deep convolutional networks. We have conducted comprehensive experiments on a number of widely-used fine-grained datasets with a variety of deep CNN architectures such as Alex networks (AlexNet), VGG networks (VGGNet) and the deep residual networks (ResNet). By integrating SWP into ResNet (ResNet-SWP), we achieve state-of-the-art results on three fine-grained datasets and the MIT67 indoor scene recognition dataset. With ResNet152-SWP models, we obtain 85:2% on the bird dataset CUB-200-2011 without bounding-box annotations and 87:4% with bounding-box, 91:2% on FGVC-aircraft, 94:1% on Stanford-cars with bounding-box information and 82:5% on the MIT67 dataset.
Advisor: Shen, Chunhua
Lin, Guosheng
Dissertation Note: Thesis (M.Phil.) -- University of Adelaide, School of Computer Science, 2017.
Keywords: deep learning
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at:
DOI: 10.4225/55/595c7aa1a62bf
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
01front.pdf146.21 kBAdobe PDFView/Open
02whole.pdf20.97 MBAdobe PDFView/Open
  Restricted Access
Library staff access only211.65 kBAdobe PDFView/Open
  Restricted Access
Library staff access only25.7 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.