Real-time monocular object instance 6D pose estimation

Do, T.Pham, T.Cai, M.Reid, I.2020-05-132020-05-132019Proceedings of the 29th British Machine Vision Conference (BMVC 2018), 2019, pp.1-12http://hdl.handle.net/2440/124685In this work, we present, LieNet, a novel deep learning framework that simultaneously detects, segments multiple object instances, and estimates their 6D poses from a single RGB image without requiring additional post-processing. Our system is accurate and fast (∼10 fps), which is well suited for real-time applications. In particular, LieNet detects and segments object instances in the image analogous to modern instance segmentation networks such as Mask R-CNN, but contains a novel additional sub-network for 6D pose estimation. LieNet estimates the rotation matrix of an object by regressing a Lie algebra based rotation representation, and estimates the translation vector by predicting the distance of the object to the camera center. The experiments on two standard pose benchmarking datasets show that LieNet greatly outperforms other recent CNN based pose prediction methods when they are used with monocular images and without post-refinements.en© 2018. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.Real-time monocular object instance 6D pose estimationConference paper10000004902-s2.0-850840152552-s2.0-85068438027498059Reid, I. [0000-0001-7790-6423]