AIML at VQA-Med 2020: Knowledge inference via a skeleton-based sentence mapping approach for medical domain visual question answering
dc.contributor.author | Liao, Z. | |
dc.contributor.author | Wu, Q. | |
dc.contributor.author | Shen, C. | |
dc.contributor.author | Van Den Hengel, A. | |
dc.contributor.author | Verjans, J. | |
dc.contributor.conference | International Conference of the CLEF Initiative (CLEF) (22 Sep 2020 - 25 Sep 2020 : virtual online) | |
dc.contributor.editor | Cappellato, L. | |
dc.contributor.editor | Eickhoff, C. | |
dc.contributor.editor | Ferro, N. | |
dc.contributor.editor | Névéol, A. | |
dc.date.issued | 2020 | |
dc.description | Session - ImageCLEF: Multimedia Retrieval in Medicine, Lifelogging, and Internet. | |
dc.description.abstract | In this paper, we describe our contribution to the 2020 ImageCLEF Medical Domain Visual Question Answering (VQA-Med) challenge. Our submissions scored first place on the VQA challenge leaderboard, and also the first place on the associated Visual Question Generation (VQG) challenge leaderboard. Our VQA approach was developed using a knowledge inference methodology called Skeleton-based Sentence Mapping (SSM). Using all the questions and answers, we derived a set of classifiable tasks and inferred the corresponding labels. As a result, we were able to transform the VQA task into a multi-task image classification problem which allowed us to focus on the image modelling aspect. We further propose a class-wise and task-wise normalization facilitating optimization of multiple tasks in a single network. This enabled us to apply a multi-scale and multi-architecture ensemble strategy for robust prediction. Lastly, we positioned the VQG task as a transfer learning problem using the VGA task trained models. The VQG task was also solved using classification. | |
dc.description.statementofresponsibility | Zhibin Liao, Qi Wu, Chunhua Shen, Anton van den Hengel, and Johan Verjans | |
dc.identifier.citation | CEUR Workshop Proceedings, 2020 / Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (ed./s), vol.2696, pp.1-14 | |
dc.identifier.issn | 1613-0073 | |
dc.identifier.orcid | Liao, Z. [0000-0001-9965-4511] | |
dc.identifier.orcid | Wu, Q. [0000-0003-3631-256X] | |
dc.identifier.orcid | Van Den Hengel, A. [0000-0003-3027-8364] | |
dc.identifier.orcid | Verjans, J. [0000-0002-8336-6774] | |
dc.identifier.uri | https://hdl.handle.net/2440/132209 | |
dc.language.iso | en | |
dc.publisher | CEUR-WS | |
dc.publisher.place | online | |
dc.relation.ispartofseries | CEUR Workshop Proceedings; 2696 | |
dc.rights | Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). | |
dc.source.uri | http://ceur-ws.org/Vol-2696 | |
dc.subject | Visual Question Answering; Visual Question Generation; Knowledge Inference; Deep Neural Networks; Skeleton-based Sentence Mapping; Class-wise and Task-wise Normalization | |
dc.title | AIML at VQA-Med 2020: Knowledge inference via a skeleton-based sentence mapping approach for medical domain visual question answering | |
dc.type | Conference paper | |
pubs.publication-status | Published |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- hdl_132209.pdf
- Size:
- 501.4 KB
- Format:
- Adobe Portable Document Format
- Description:
- Published version