Visual Question Answering as a meta learning task

Teney, D.; Van Den Hengel, A.

doi:10.1007/978-3-030-01267-0_14

Visual Question Answering as a meta learning task

dc.contributor.author	Teney, D.
dc.contributor.author	Van Den Hengel, A.
dc.contributor.conference	15th European Conference on Computer Vision (ECCV 2018) (8 Sep 2018 - 14 Sep 2018 : Munich)
dc.contributor.editor	Ferrari, V.
dc.contributor.editor	Hebert, M.
dc.contributor.editor	Sminchisescu, C.
dc.contributor.editor	Weiss, Y.
dc.date.issued	2018
dc.description.abstract	The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set seems unlikely, and representing it in a reasonable number of weights doubly so. We propose instead to approach VQA as a meta learning task, thus separating the question answering method from the information required. At test time, the method is provided with a support set of example questions/answers, over which it reasons to resolve the given question. The support set is not fixed and can be extended without retraining, thereby expanding the capabilities of the model. To exploit this dynamically provided information, we adapt a state-of-the-art VQA model with two techniques from the recent meta learning literature, namely prototypical networks and meta networks. Experiments demonstrate the capability of the system to learn to produce completely novel answers (i.e. never seen during training) from examples provided at test time. In comparison to the existing state of the art, the proposed method produces qualitatively distinct results with higher recall of rare answers, and a better sample efficiency that allows training with little initial data. More importantly, it represents an important step towards vision-and-language methods that can learn and reason on-the-fly.
dc.description.statementofresponsibility	Damien Teney and Anton van den Hengel
dc.identifier.citation	Lecture Notes in Artificial Intelligence, 2018 / Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (ed./s), vol.11219 LNCS, pp.229-245
dc.identifier.doi	10.1007/978-3-030-01267-0_14
dc.identifier.isbn	9783030012663
dc.identifier.issn	0302-9743
dc.identifier.issn	1611-3349
dc.identifier.orcid	Teney, D. [0000-0003-2130-6650]
dc.identifier.orcid	Van Den Hengel, A. [0000-0003-3027-8364]
dc.identifier.uri	http://hdl.handle.net/2440/116282
dc.language.iso	en
dc.publisher	Springer
dc.relation.ispartofseries	Lecture Notes in Computer Science; 11219
dc.rights	© Springer Nature Switzerland AG 2018
dc.source.uri	https://doi.org/10.1007/978-3-030-01267-0_14
dc.title	Visual Question Answering as a meta learning task
dc.type	Conference paper
pubs.publication-status	Published

Collections

Australian Institute for Machine Learning publications

Visual Question Answering as a meta learning task

Files

Collections