Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/115995
Citations
Scopus Web of Science® Altmetric
?
?
Full metadata record
DC FieldValueLanguage
dc.contributor.authorWang, P.-
dc.contributor.authorWu, Q.-
dc.contributor.authorShen, C.-
dc.contributor.authorvan den Hengel, A.-
dc.date.issued2017-
dc.identifier.citationProceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2017, vol.2017-January, pp.3909-3918-
dc.identifier.isbn9781538604571-
dc.identifier.issn1063-6919-
dc.identifier.urihttp://hdl.handle.net/2440/115995-
dc.description.abstractOne of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and counting, to segmentation and reconstruction. To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best. Our method thus learns how to exploit a set of external off-the-shelf algorithms to achieve its goal, an approach that has something in common with the Neural Turing Machine [10]. The core of our proposed method is a new co-attention model. In addition, the proposed approach generates human-readable reasons for its decision, and can still be trained end-to-end without ground truth reasons being given. We demonstrate the effectiveness on two publicly available datasets, Visual Genome and VQA, and show that it produces the state-of-the-art results in both cases.-
dc.description.statementofresponsibilityPeng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel-
dc.language.isoen-
dc.publisherIEEE-
dc.relation.ispartofseriesIEEE Conference on Computer Vision and Pattern Recognition-
dc.rights© 2017 IEEE-
dc.source.urihttp://dx.doi.org/10.1109/cvpr.2017.416-
dc.titleThe VQA-machine: learning how to use existing vision algorithms to answer new questions-
dc.typeConference paper-
dc.contributor.conference30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (21 Jul 2017 - 26 Jul 2017 : Honolulu)-
dc.identifier.doi10.1109/CVPR.2017.416-
pubs.publication-statusPublished-
dc.identifier.orcidWu, Q. [0000-0003-3631-256X]-
dc.identifier.orcidvan den Hengel, A. [0000-0003-3027-8364]-
Appears in Collections:Aurora harvest 8
Australian Institute for Machine Learning publications
Computer Science publications

Files in This Item:
There are no files associated with this item.


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.