Temporally Consistent Referring Video Object Segmentation With Hybrid Memory
Date
2024
Authors
Miao, B.
Bennamoun, M.
Gao, Y.
Shah, M.
Mian, A.
Editors
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Journal article
Citation
IEEE Transactions on Circuits and Systems for Video Technology, 2024; 34(11):11373-11385
Statement of Responsibility
Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian
Conference Name
Abstract
Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates inter-frame collaboration for robust spatio-temporal matching and propagation. Features of frames with automatically generated high-quality reference masks are propagated to segment the remaining frames based on multi-granularity association to achieve temporally consistent R-VOS. Furthermore, we propose a new Mask Consistency Score (MCS) metric to evaluate the temporal consistency of video segmentation. Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin, leading to top-ranked performance on popular R-VOS benchmarks, i.e., Ref-YouTube-VOS (67.1%) and Ref-DAVIS17 (65.6%). The code is available at https://github.com/bo-miao/HTR.
School/Discipline
Dissertation Note
Provenance
Description
Access Status
Rights
© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.