Visual Perception and Reconstruction at Scale
Date
2025
Authors
Ge, Yongtao
Editors
Advisors
Shen, Chunhua
Liao, Zhibin
Liao, Zhibin
Journal Title
Journal ISSN
Volume Title
Type:
Thesis
Citation
Statement of Responsibility
Conference Name
Abstract
The advancement of visual perception and reconstruction tasks relies heavily on the availability of large, accurately annotated datasets. However, acquiring such data is often expensive, time-consuming, and sometimes infeasible, particularly for complex scenarios or novel applications. This limitation poses a significant bottleneck in scaling these crucial computer vision technologies for real-world deployment. Traditional fully supervised learning paradigms demand extensive manual labeling, hindering progress in domains where data collection is challenging or annotation is subjective or expansive. Consequently, there is a critical need to explore alternative learning methodologies that can effectively leverage less expensive forms of data, generate informative synthetic data, and transfer knowledge from existing large foundation models. This thesis aims to address the challenge of data scarcity in visual perception and reconstruction by investigating and developing data-efficient learning strategies. Specifically, we focus on three key approaches: (1) Weakly and Semi-Supervised Learning, exploring methods to train high-performing models using data with reduced or imprecise annotations, thereby bridging the gap between data availability and annotation cost. (2) Synthetic data generation, investigating techniques to create large, perfectly labeled datasets through computer graphics and simulation, offering precise control over training conditions and overcoming the limitations of real-world data acquisition. (3) Leveraging pre-trained priors from large-scale weak annotations, e.g., discriminative priors from self-supervised learning and generative priors from text-to-image generation, to improve the performance and robustness of models trained on limited task-specific data. Our overarching goal is to demonstrate the efficacy of these data-efficient techniques across a spectrum of fundamental visual perception and 3D reconstruction tasks, including 2D object detection and segmentation, video matting, 3D human pose and shape estimation, monocular depth estimation, and multi-view reconstruction. By systematically evaluating and advancing these methodologies, this thesis seeks to contribute to the development of more scalable, adaptable, and practically deployable computer vision systems in data-constrained environments.
School/Discipline
School of Computer and Mathematical Sciences
Dissertation Note
Thesis (Ph.D.) -- University of Adelaide, School of Computer and Mathematical Sciences, 2025
Provenance
This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals