Please use this identifier to cite or link to this item: http://hdl.handle.net/2440/120057
Citations
Scopus Web of Science® Altmetric
?
?
Type: Conference paper
Title: Bottom-up and top-down attention for image captioning and visual question answering
Author: Anderson, P.
He, X.
Buehler, C.
Teney, D.
Johnson, M.
Gould, S.
Zhang, L.
Citation: Proceedings: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2018), 2018 / pp.6077-6086
Publisher: IEEE
Issue Date: 2018
Series/Report no.: IEEE Conference on Computer Vision and Pattern Recognition
ISBN: 9781538664209
ISSN: 1063-6919
2575-7075
Conference Name: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (18 Jun 2018 - 23 Jun 2018 : Salt Lake City, UT)
Statement of
Responsibility: 
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, Lei Zhang
Abstract: Top-down visual attention mechanisms have been used extensively in image captioning and visual question answering (VQA) to enable deeper image understanding through fine-grained analysis and even multiple steps of reasoning. In this work, we propose a combined bottom-up and top-down attention mechanism that enables attention to be calculated at the level of objects and other salient image regions. This is the natural basis for attention to be considered. Within our approach, the bottom-up mechanism (based on Faster R-CNN) proposes image regions, each with an associated feature vector, while the top-down mechanism determines feature weightings. Applying this approach to image captioning, our results on the MSCOCO test server establish a new state-of-the-art for the task, achieving CIDEr / SPICE / BLEU-4 scores of 117.9, 21.5 and 36.9, respectively. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge.
Rights: © 2018 IEEE
RMID: 0030108853
DOI: 10.1109/CVPR.2018.00636
Grant ID: http://purl.org/au-research/grants/arc/CE140100016
http://purl.org/au-research/grants/arc/DP160102156
Published version: https://ieeexplore.ieee.org/xpl/conhome/8576498/proceeding
Appears in Collections:Australian Institute for Machine Learning publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.