Please use this identifier to cite or link to this item:
Type: Thesis
Title: Structured output prediction and binary code learning in computer vision.
Author: Lin, Guosheng
Issue Date: 2015
School/Discipline: School of Computer Science
Abstract: Machine learning techniques play essential roles in many computer vision applications. This thesis is dedicated to two types of machine learning techniques which are important to computer vision: structured learning and binary code learning. Structured learning is for predicting complex structured output of which the components are inter-dependent. Structured outputs are common in real-world applications. The image segmentation mask is an example of structured output. Binary code learning is to learn hash functions that map data points into binary codes. The binary code representation is popular for large-scale similarity search, indexing and storage. This thesis has made practical and theoretical contributions to these two types of learning techniques. The first part of this thesis focuses on boosting based structured output prediction. Boosting is a type of methods for learning a single accurate predictor by linearly combining a set of less accurate weak learners. As a special case of structured learning, we first propose an efficient boosting method for multi-class classification, which can be applied to image classification. Different from many existing multi-class boosting methods, we train class specified weak learners by separately learning weak learners for each class. We also develop a fast coordinate descent method for solving the optimization problem, in which we have closed-form solution for each coordinate update. For general structured output prediction, we propose a new boosting based method, which we refer to as StructBoost. StructBoost supports nonlinear structured learning by combining a set of weak structured learners. Our StructBoost generalizes standard boosting approaches such as AdaBoost, or LPBoost to structured learning. The resulting optimization problem is challenging in the sense that it may involve exponentially many variables and constraints. We develop cutting plane and column generation based algorithms to efficiently solve the optimization. We show the versatility and usefulness of StructBoost on a range of problems such as optimizing the tree loss for hierarchical multi-class classification, optimizing the Pascal overlap criterion for robust visual tracking and learning conditional random field parameters for image segmentation. The last part of this thesis focuses on hashing methods for binary code learning. We develop three novel hashing methods which focus on different aspects of binary code learning. We first present a column generation based hash function learning method for preserving triplet based relative pairwise similarity. Given a set of triplets that encode the pairwise similarity comparison information, our method learns hash functions within the large-margin learning framework. At each iteration of the column generation procedure, the best hash function is selected. We show that our method with triplet based formulation and large-margin learning is able to learn high quality hash functions. The second hashing learning method in this thesis is a flexible and general method with a two-step learning scheme. Most existing approaches to hashing apply a single form of hash function, and an optimization process which is typically deeply coupled to this specific form. This tight coupling restricts the flexibility of the method to respond to the data, and can result in complex optimization problems that are difficult to solve. In this chapter we propose a flexible yet simple framework that is able to accommodate different types of loss functions and hash functions. This framework allows a number of existing approaches to hashing to be placed in context, and simplifies the development of new problem-specific hashing methods. Our framework decomposes the hashing learning problem into two steps: hash bit learning and hash function learning based on the learned bits. The first step can typically be formulated as binary quadratic problems, and the second step can be accomplished by training standard binary classifiers. These two steps can be easily solved by leveraging sophisticated algorithms in the literature. The third hashing learning method aims for efficient and effective hash function learning on large-scale and high-dimensional data, which is an extension of our general two-step hashing method. Non-linear hash functions have demonstrated their advantage over linear ones due to their powerful generalization capability. In the literature, kernel functions are typically used to achieve non-linearity in hashing, which achieve encouraging retrieval performance at the price of slow evaluation and training time. We propose to use boosted decision trees for achieving non-linearity in hashing, which are fast to train and evaluate, hence more suitable for hashing with high dimensional data. In our approach, we first propose sub-modular formulations for the hashing binary code inference problem and an efficient GraphCut based block search method for solving large-scale inference. Then we learn hash functions by training boosted decision trees to fit the binary codes. We show that our method significantly outperforms most existing methods both in retrieval precision and training time, especially for high-dimensional data.
Advisor: Shen, Chunhua
Suter, David
Chin, Tat-Jun
Dissertation Note: Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2015
Keywords: machine learning; computer vision; structured output prediction; structured learning; binary code learning; hashing; image retrieval; image segmentation
Provenance: This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at:
Appears in Collections:Research Theses

Files in This Item:
File Description SizeFormat 
01front.pdf325.86 kBAdobe PDFView/Open
02whole.pdf4.05 MBAdobe PDFView/Open
PermissionsLibrary staff access only248.23 kBAdobe PDFView/Open
RestrictedLibrary staff access only32.97 MBAdobe PDFView/Open

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.