Spectrum-guided Multi-granularity Referring Video Object Segmentation

Miao, B.; Bennamoun, M.; Gao, Y.; Mian, A.

doi:10.1109/ICCV51070.2023.00091

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Date

2024

Authors

Miao, B.

Bennamoun, M.

Gao, Y.

Mian, A.

Type:

Conference paper

Citation

Proceedings / IEEE International Conference on Computer Vision. IEEE International Conference on Computer Vision, 2024, pp.920-930

Statement of Responsibility

Bo Miao, Mohammed Bennamoun, Yongsheng Gao, Ajmal Mian

Conference Name

IEEE/CVF International Conference on Computer Vision (ICCV) (1 Oct 2023 - 6 Oct 2023 : Paris, France)

DOI

10.1109/ICCV51070.2023.00091

Abstract

Current referring video object segmentation (R-VOS) techniques extract conditional kernels from encoded (lowresolution) vision-language features to segment the decoded high-resolution features. We discovered that this causes significant feature drift, which the segmentation kernels struggle to perceive during the forward computation. This negatively affects the ability of segmentation kernels. To address the drift problem, we propose a Spectrum-guided Multigranularity (SgMg) approach, which performs direct segmentation on the encoded features and employs visual details to further optimize the masks. In addition, we propose Spectrum-guided Cross-modal Fusion (SCF) to perform intra-frame global interactions in the spectral domain for effective multimodal representation. Finally, we extend SgMg to perform multi-object R-VOS, a new paradigm that enables simultaneous segmentation of multiple referred objects in a video. This not only makes R-VOS faster, but also more practical. Extensive experiments show that SgMg achieves state-of-the-art performance on four video benchmark datasets, outperforming the nearest competitor by 2.8% points on Ref-YouTube-VOS. Our extended SgMg enables multi-object R-VOS, runs about 3× faster while maintaining satisfactory performance. Code is available at https://github.com/bo-miao/SgMg.

Rights

Grant ID

http://purl.org/au-research/grants/arc/IH180100002
http://purl.org/au-research/grants/arc/FT210100268

Published Version

https://ieeexplore.ieee.org/xpl/conhome/10376473/proceeding

Persistent link to this record

https://hdl.handle.net/2440/146797

Full item page

Spectrum-guided Multi-granularity Referring Video Object Segmentation

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

DOI

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record