Temporally Consistent Referring Video Object Segmentation With Hybrid Memory
dc.contributor.author | Miao, B. | |
dc.contributor.author | Bennamoun, M. | |
dc.contributor.author | Gao, Y. | |
dc.contributor.author | Shah, M. | |
dc.contributor.author | Mian, A. | |
dc.date.issued | 2024 | |
dc.description.abstract | Referring Video Object Segmentation (R-VOS) methods face challenges in maintaining consistent object segmentation due to temporal context variability and the presence of other visually similar objects. We propose an end-to-end R-VOS paradigm that explicitly models temporal instance consistency alongside the referring segmentation. Specifically, we introduce a novel hybrid memory that facilitates inter-frame collaboration for robust spatio-temporal matching and propagation. Features of frames with automatically generated high-quality reference masks are propagated to segment the remaining frames based on multi-granularity association to achieve temporally consistent R-VOS. Furthermore, we propose a new Mask Consistency Score (MCS) metric to evaluate the temporal consistency of video segmentation. Extensive experiments demonstrate that our approach enhances temporal consistency by a significant margin, leading to top-ranked performance on popular R-VOS benchmarks, i.e., Ref-YouTube-VOS (67.1%) and Ref-DAVIS17 (65.6%). The code is available at https://github.com/bo-miao/HTR. | |
dc.description.statementofresponsibility | Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Mubarak Shah, Ajmal Mian | |
dc.identifier.citation | IEEE Transactions on Circuits and Systems for Video Technology, 2024; 34(11):11373-11385 | |
dc.identifier.doi | 10.1109/TCSVT.2024.3419119 | |
dc.identifier.issn | 1051-8215 | |
dc.identifier.issn | 1558-2205 | |
dc.identifier.orcid | Miao, B. [0000-0002-3025-4429] | |
dc.identifier.uri | https://hdl.handle.net/2440/145804 | |
dc.language.iso | en | |
dc.publisher | Institute of Electrical and Electronics Engineers | |
dc.relation.grant | http://purl.org/au-research/grants/arc/IH180100002 | |
dc.relation.grant | http://purl.org/au-research/grants/arc/FT210100268 | |
dc.rights | © 2024 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. | |
dc.source.uri | https://doi.org/10.1109/tcsvt.2024.3419119 | |
dc.subject | Referring video object segmentation; temporal consistency; deep learning; feature extraction | |
dc.title | Temporally Consistent Referring Video Object Segmentation With Hybrid Memory | |
dc.type | Journal article | |
pubs.publication-status | Published |