Please use this identifier to cite or link to this item:
https://hdl.handle.net/2440/115995
Citations | ||
Scopus | Web of Science® | Altmetric |
---|---|---|
?
|
?
|
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Wang, P. | - |
dc.contributor.author | Wu, Q. | - |
dc.contributor.author | Shen, C. | - |
dc.contributor.author | van den Hengel, A. | - |
dc.date.issued | 2017 | - |
dc.identifier.citation | Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2017, vol.2017-January, pp.3909-3918 | - |
dc.identifier.isbn | 9781538604571 | - |
dc.identifier.issn | 1063-6919 | - |
dc.identifier.uri | http://hdl.handle.net/2440/115995 | - |
dc.description.abstract | One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and counting, to segmentation and reconstruction. To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best. Our method thus learns how to exploit a set of external off-the-shelf algorithms to achieve its goal, an approach that has something in common with the Neural Turing Machine [10]. The core of our proposed method is a new co-attention model. In addition, the proposed approach generates human-readable reasons for its decision, and can still be trained end-to-end without ground truth reasons being given. We demonstrate the effectiveness on two publicly available datasets, Visual Genome and VQA, and show that it produces the state-of-the-art results in both cases. | - |
dc.description.statementofresponsibility | Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel | - |
dc.language.iso | en | - |
dc.publisher | IEEE | - |
dc.relation.ispartofseries | IEEE Conference on Computer Vision and Pattern Recognition | - |
dc.rights | © 2017 IEEE | - |
dc.source.uri | http://dx.doi.org/10.1109/cvpr.2017.416 | - |
dc.title | The VQA-machine: learning how to use existing vision algorithms to answer new questions | - |
dc.type | Conference paper | - |
dc.contributor.conference | 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (21 Jul 2017 - 26 Jul 2017 : Honolulu) | - |
dc.identifier.doi | 10.1109/CVPR.2017.416 | - |
pubs.publication-status | Published | - |
dc.identifier.orcid | Wu, Q. [0000-0003-3631-256X] | - |
dc.identifier.orcid | van den Hengel, A. [0000-0003-3027-8364] | - |
Appears in Collections: | Aurora harvest 8 Australian Institute for Machine Learning publications Computer Science publications |
Files in This Item:
There are no files associated with this item.
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.