VLAD-BuFF: Burst-Aware Fast Feature Aggregation for Visual Place Recognition

Khaliq, A.; Xu, M.; Hausler, S.; Milford, M.; Garg, S.

doi:10.1007/978-3-031-72784-9_25

VLAD-BuFF: Burst-Aware Fast Feature Aggregation for Visual Place Recognition

Date

2025

Authors

Khaliq, A.

Xu, M.

Hausler, S.

Milford, M.

Garg, S.

Editors

Leonardis, A.
Ricci, E.
Roth, S.
Russakovsky, O.
Sattler, T.
Varol, G.

Type:

Conference paper

Citation

Lecture Notes in Artificial Intelligence, 2025 / Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (ed./s), vol.15102, pp.447-466

Statement of Responsibility

Ahmad Khaliq, Ming Xu, Stephen Hausler, Michael Milford, Sourav Garg

Conference Name

18th European Conference on Computer Vision (ECCV) (29 Sep 2024 - 4 Oct 2024 : Milan)

DOI

10.1007/978-3-031-72784-9_25

Abstract

Visual Place Recognition (VPR) is a crucial component of many visual localization pipelines for embodied agents. VPR is often formulated as an image retrieval task aimed at jointly learning local features and an aggregation method. The current state-of-the-art VPR methods rely on VLAD aggregation, which can be trained to learn a weighted contribution of features through their soft assignment to cluster centers. However, this process has two key limitations. Firstly, the feature-to-cluster weighting does not account for over-represented repetitive structures within a cluster, e.g., shadows or window panes; this phenomenon is also referred to as the ‘burstiness’ problem, classically solved by discounting repetitive features before aggregation. Secondly, feature to cluster comparisons are compute-intensive for state-of-the-art image encoders with high-dimensional local features. This paper addresses these limitations by introducing VLAD-BuFF with two novel contributions: i) a self-similarity based feature discounting mechanism to learn Burst-aware features within end-to-end VPR training, and ii) Fast Feature aggregation by reducing local feature dimensions specifically through PCA-initialized learnable pre-projection. We benchmark our method on 9 public datasets, where VLAD-BuFF sets a new state of the art. Our method is able to maintain its high recall even for 12x reduced local feature dimensions, thus enabling fast feature aggregation without compromising on recall. Through additional qualitative studies, we show how our proposed weighting method effectively downweights the non-distinctive features. Source code: https://github.com/Ahmedest61/VLAD-BuFF/.

Rights

Grant ID

http://purl.org/au-research/grants/arc/FL210100156

Published Version

https://link.springer.com/book/10.1007/978-3-031-72784-9

Persistent link to this record

https://hdl.handle.net/2440/144308

Full item page

VLAD-BuFF: Burst-Aware Fast Feature Aggregation for Visual Place Recognition

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

DOI

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record