Overcoming language priors in VQA via decomposed linguistic representations
Date
2020
Authors
Jing, C.
Wu, Y.
Zhang, X.
Jia, Y.
Wu, Q.
Editors
Advisors
Journal Title
Journal ISSN
Volume Title
Type:
Conference paper
Citation
Proceedings of the ... AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, 2020, vol.34, iss.7, pp.11181-11188
Statement of Responsibility
Chenchen Jing, Yuwei Wu, Xiaoxun Zhang, Yunde Jia, Qi Wu
Conference Name
AAAI Conference on Artificial Intelligence (AAAI) (7 Feb 2020 - 12 Feb 2020 : New York, USA)
Abstract
Most existing Visual Question Answering (VQA) models overly rely on language priors between questions and answers. In this paper, we present a novel method of language attention-based VQA that learns decomposed linguistic representations of questions and utilizes the representations to infer answers for overcoming language priors. We introduce a modular language attention mechanism to parse a question into three phrase representations: type representation, object representation, and concept representation. We use the type representation to identify the question type and the possible answer set (yes/no or specific concepts such as colors or numbers), and the object representation to focus on the relevant region of an image. The concept representation is verified with the attended region to infer the final answer. The proposed method decouples the language-based concept discovery and vision-based concept verification in the process of answer inference to prevent language priors from dominating the answering process. Experiments on the VQA-CP dataset demonstrate the effectiveness of our method.
School/Discipline
Dissertation Note
Provenance
Description
Access Status
Rights
© 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.