The VQA-machine: learning how to use existing vision algorithms to answer new questions

Wang, P.; Wu, Q.; Shen, C.; van den Hengel, A.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/115995

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Wang, P.	-
dc.contributor.author	Wu, Q.	-
dc.contributor.author	Shen, C.	-
dc.contributor.author	van den Hengel, A.	-
dc.date.issued	2017	-
dc.identifier.citation	Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2017, vol.2017-January, pp.3909-3918	-
dc.identifier.isbn	9781538604571	-
dc.identifier.issn	1063-6919	-
dc.identifier.uri	http://hdl.handle.net/2440/115995	-
dc.description.abstract	One of the most intriguing features of the Visual Question Answering (VQA) challenge is the unpredictability of the questions. Extracting the information required to answer them demands a variety of image operations from detection and counting, to segmentation and reconstruction. To train a method to perform even one of these operations accurately from {image, question, answer} tuples would be challenging, but to aim to achieve them all with a limited set of such training data seems ambitious at best. Our method thus learns how to exploit a set of external off-the-shelf algorithms to achieve its goal, an approach that has something in common with the Neural Turing Machine [10]. The core of our proposed method is a new co-attention model. In addition, the proposed approach generates human-readable reasons for its decision, and can still be trained end-to-end without ground truth reasons being given. We demonstrate the effectiveness on two publicly available datasets, Visual Genome and VQA, and show that it produces the state-of-the-art results in both cases.	-
dc.description.statementofresponsibility	Peng Wang, Qi Wu, Chunhua Shen, Anton van den Hengel	-
dc.language.iso	en	-
dc.publisher	IEEE	-
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition	-
dc.rights	© 2017 IEEE	-
dc.source.uri	http://dx.doi.org/10.1109/cvpr.2017.416	-
dc.title	The VQA-machine: learning how to use existing vision algorithms to answer new questions	-
dc.type	Conference paper	-
dc.contributor.conference	30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) (21 Jul 2017 - 26 Jul 2017 : Honolulu)	-
dc.identifier.doi	10.1109/CVPR.2017.416	-
pubs.publication-status	Published	-
dc.identifier.orcid	Wu, Q. [0000-0003-3631-256X]	-
dc.identifier.orcid	van den Hengel, A. [0000-0003-3027-8364]	-
Appears in Collections:	Aurora harvest 8 Australian Institute for Machine Learning publications Computer Science publications

Files in This Item:

There are no files associated with this item.

Show simple item record

Adelaide Research & Scholarship