3D Scene Reconstruction from A Monocular Image

Yin, Wei

3D Scene Reconstruction from A Monocular Image

Files

Yin2022_PhD.pdf (31.23 MB)

Date

2022

Authors

Yin, Wei

Advisors

Shen, Chunhua
Wu, Qi

Type:

Thesis

Abstract

3D scene reconstruction is a fundamental task in computer vision. The established approaches to address this task are based on multi-view geometry, which create correspondence of feature points with consecutive frames or multiple views. Finally, 3D information of these feature points can be recovered. In contrast, we aim to achieve dense 3D scene shape reconstruction from a single in-the-wild image. Without multiple views available, we rely on deep learning techniques. Recently, deep neural networks have been the dominant solution for various computer vision problems. Thus, we propose a two stage method based on learning-based methods. Firstly, we employ fully-convolutional neural networks to learn accurate depth from a monocular image. To recover high-quality depth, we lift the depth to 3D space and propose a global geometric constraint, termed virtual normal loss. To improve the generalization ability of the monocular depth estimation module, we construct a large-scale and diverse dataset and propose to learn the affine-invariant depth on that. Experiments demonstrate that our monocular depth estimation methods can robustly work in the wild and recover high-quality 3D geometry information. Furthermore, we propose a novel second stage to predict the focal length with a point cloud network. Instead of directly predicting it, the point cloud module leverages point cloud encoder networks that predict focal length adjustment factors from an initial guess of the scene point cloud reconstruction. The domain gap is significantly less of an issue for point clouds than that for images. Combing two stage modules together, 3D shape can be recovered from a single image input. Note that such reconstruction is up to a scale. To recover metric 3D shape, we propose to input the sparse points as guidance. Our proposed training method can significantly improve the robustness of the system, including robustness to various sparsity patterns and diverse scenes.

School/Discipline

School of Computer Science

Dissertation Note

Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2022

Provenance

This electronic version is made publicly available by the University of Adelaide in accordance with its open access policy for student theses. Copyright in this thesis remains with the author. This thesis may incorporate third party material which has been used by the author pursuant to Fair Dealing exceptions. If you are the owner of any included third party copyright material you wish to be removed from this electronic version, please complete the take down form located at: http://www.adelaide.edu.au/legals

Persistent link to this record

https://hdl.handle.net/2440/134585

Full item page

3D Scene Reconstruction from A Monocular Image

Files

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record