Tips and tricks for visual question answering: learnings from the 2017 challenge

Teney, D.; Anderson, P.; He, X.; Van Den Hengel, A.

doi:10.1109/CVPR.2018.00444

Tips and tricks for visual question answering: learnings from the 2017 challenge

dc.contributor.author	Teney, D.
dc.contributor.author	Anderson, P.
dc.contributor.author	He, X.
dc.contributor.author	Van Den Hengel, A.
dc.contributor.conference	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (18 Jun 2018 - 23 Jun 2018 : Salt Lake City, USA)
dc.date.issued	2018
dc.description.abstract	Deep Learning has had a transformative impact on Computer Vision, but for all of the success there is also a significant cost. This is that the models and procedures used are so complex and intertwined that it is often impossible to distinguish the impact of the individual design and engineering choices each model embodies. This ambiguity diverts progress in the field, and leads to a situation where developing a state-of-the-art model is as much an art as a science. As a step towards addressing this problem we present a massive exploration of the effects of the myriad architectural and hyperparameter choices that must be made in generating a state-of-the-art model. The model is of particular interest because it won the 2017 Visual Question Answering Challenge. We provide a detailed analysis of the impact of each choice on model performance, in the hope that it will inform others in developing models, but also that it might set a precedent that will accelerate scientific progress in the field.
dc.description.statementofresponsibility	Damien Teney, Peter Anderson, Xiaodong He, Anton van den Hengel
dc.identifier.citation	Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp.4223-4232
dc.identifier.doi	10.1109/CVPR.2018.00444
dc.identifier.isbn	9781538664216
dc.identifier.issn	1063-6919
dc.identifier.issn	2575-7075
dc.identifier.orcid	Teney, D. [0000-0003-2130-6650]
dc.identifier.orcid	Van Den Hengel, A. [0000-0003-3027-8364]
dc.identifier.uri	http://hdl.handle.net/2440/128334
dc.language.iso	en
dc.publisher	IEEE
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition
dc.rights	© 2018 IEEE
dc.source.uri	https://ieeexplore.ieee.org/xpl/conhome/8576498/proceeding
dc.title	Tips and tricks for visual question answering: learnings from the 2017 challenge
dc.type	Conference paper
pubs.publication-status	Published

Collections

Australian Institute for Machine Learning publications

Tips and tricks for visual question answering: learnings from the 2017 challenge

Files

Collections