Please use this identifier to cite or link to this item:
Type: Thesis
Title: Dynamic Scene Understanding with Applications to Traffic Monitoring
Author: Hu, Qichang
Issue Date: 2017
School/Discipline: School of Computer Science
Abstract: Many breakthroughs have been witnessed in the computer vision community in recent years, largely due to deep Convolutional Neural Networks (CNN) and largescale datasets. This thesis aims to investigate dynamic scene understanding from images. The problem of dynamic scene understanding involves simultaneously solving several sub-tasks including object detection, object recognition, and segmentation. Successfully completing these tasks will enable us to interpret the objects of interest within a scene. Vision-based traffic monitoring is one of many fast-emerging areas in the intelligent transportation system (ITS). In the thesis, we focus on the following problems in traffic scene understanding. They are 1) How to detect and recognize all the objects of interest in street view images? 2) How to employ CNN features and semantic pixel labelling to boost the performance of pedestrian detection? 3) How to enhance the discriminative power of CNN representations for improving the performance of fine-grained car recognition? 4) How to learn an adaptive color space to represent vehicle images for vehicle color recognition? For the first task, we propose a single learning based detection framework to detect three important classes of objects (traffic signs, cars, and cyclists). The proposed framework consists of a dense feature extractor and detectors of these three classes. The advantage of using one common framework is that the detection speed is much faster, since all dense features need only to be evaluated once and then are shared with all detectors. The proposed framework introduces spatially pooled features as a part of aggregated channel features to enhance the robustness to noises and image deformations. We also propose an object subcategorization scheme as a means of capturing the intra-class variation of objects. To address the second problem, we show that by re-using the convolutional feature maps (CFMs) of a deep CNN model as visual features to train an ensemble of boosted decision forests, we are able to remarkably improve the performance of pedestrian detection without using specially designed learning algorithms. We also show that semantic pixel labelling can be simply combined with a pedestrian detector to further boost the detection performance. Fine-grained details of objects usually contain very discriminative information which are crucial for fine-grained object recognition. Conventional pooling strategies (e.g. max-pooling, average-pooling) may discard these fine-grained details and hurt the iii iv recognition performance. To remedy this problem, we propose a spatially weighted pooling (swp) strategy which considerably improves the discriminative power of CNN representations. The swp pools the CNN features with the guidance of its learnt masks, which measures the importance of the spatial units in terms of discriminative power. In image color recognition, visual features are extracted from image pixels represented in one color space. The choice of the color space may influence the quality of extracted features and impact the recognition performance. We propose a color transformation method that converts image pixels from the RGB space to a learnt space for improving the recognition performance. Moreover, we propose a ColorNet which optimizes the architecture of AlexNet and embeds a mini-CNN of color transformation for vehicle color recognition.
Advisor: Shen, Chunhua
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2017
Keywords: Traffic scene perception
Object subcategorization
Treffic sign detection
Car detection
Cyclist detection
Pedestrian detection
Fine-grained recognition
Car model classification
Vehicle color recognition
Deep learning
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at:
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
Hu2017_PhD.pdf9.82 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.