Temporally Consistent Referring Video Object Segmentation With Hybrid Memory

Miao, B.; Bennamoun, M.; Gao, Y.; Shah, M.; Mian, A.

doi:10.1109/TCSVT.2024.3419119

Temporally Consistent Referring Video Object Segmentation With Hybrid Memory

dc.contributor.author	Miao, B.
dc.contributor.author	Bennamoun, M.
dc.contributor.author	Gao, Y.
dc.contributor.author	Shah, M.
dc.contributor.author	Mian, A.
dc.date.issued	2024
dc.description.abstract	Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates inter-frame collaboration for robust spatio-temporal matching and propagation. Features of frames with automatically generated high-quality reference masks are propagated to segment the remaining frames based on multi-granularity association to achieve temporally consistent R-VOS. Furthermore, we propose a new Mask Consistency Score (MCS) metric to evaluate the temporal consistency of video segmentation. Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin, leading to top-ranked performance on popular R-VOS benchmarks, i.e., Ref-YouTube-VOS (67.1%) and Ref-DAVIS17 (65.6%). The code is available at https://github.com/bo-miao/HTR.
dc.description.statementofresponsibility	Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian
dc.identifier.citation	IEEE Transactions on Circuits and Systems for Video Technology, 2024; 34(11):11373-11385
dc.identifier.doi	10.1109/TCSVT.2024.3419119
dc.identifier.issn	1051-8215
dc.identifier.issn	1558-2205
dc.identifier.orcid	Miao, B. [0000-0002-3025-4429]
dc.identifier.uri	https://hdl.handle.net/2440/145804
dc.language.iso	en
dc.publisher	Institute of Electrical and Electronics Engineers
dc.relation.grant	http://purl.org/au-research/grants/arc/IH180100002
dc.relation.grant	http://purl.org/au-research/grants/arc/FT210100268
dc.rights	© 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
dc.source.uri	https://doi.org/10.1109/tcsvt.2024.3419119
dc.subject	Referring video object segmentation; temporal consistency; deep learning; feature extraction
dc.title	Temporally Consistent Referring Video Object Segmentation With Hybrid Memory
dc.type	Journal article
pubs.publication-status	Published

Collections

Research Outputs

Temporally Consistent Referring Video Object Segmentation With Hybrid Memory

Files

Collections