Please use this identifier to cite or link to this item:
https://hdl.handle.net/2440/132209
Type: | Conference paper |
Title: | AIML at VQA-Med 2020: Knowledge inference via a skeleton-based sentence mapping approach for medical domain visual question answering |
Author: | Liao, Z. Wu, Q. Shen, C. Van Den Hengel, A. Verjans, J. |
Citation: | CEUR Workshop Proceedings, 2020 / Cappellato, L., Eickhoff, C., Ferro, N., Névéol, A. (ed./s), vol.2696, pp.1-14 |
Publisher: | CEUR-WS |
Publisher Place: | online |
Issue Date: | 2020 |
Series/Report no.: | CEUR Workshop Proceedings; 2696 |
ISSN: | 1613-0073 |
Conference Name: | International Conference of the CLEF Initiative (CLEF) (22 Sep 2020 - 25 Sep 2020 : virtual online) |
Editor: | Cappellato, L. Eickhoff, C. Ferro, N. Névéol, A. |
Statement of Responsibility: | Zhibin Liao, Qi Wu, Chunhua Shen, Anton van den Hengel, and Johan Verjans |
Abstract: | In this paper, we describe our contribution to the 2020 ImageCLEF Medical Domain Visual Question Answering (VQA-Med) challenge. Our submissions scored first place on the VQA challenge leaderboard, and also the first place on the associated Visual Question Generation (VQG) challenge leaderboard. Our VQA approach was developed using a knowledge inference methodology called Skeleton-based Sentence Mapping (SSM). Using all the questions and answers, we derived a set of classifiable tasks and inferred the corresponding labels. As a result, we were able to transform the VQA task into a multi-task image classification problem which allowed us to focus on the image modelling aspect. We further propose a class-wise and task-wise normalization facilitating optimization of multiple tasks in a single network. This enabled us to apply a multi-scale and multi-architecture ensemble strategy for robust prediction. Lastly, we positioned the VQG task as a transfer learning problem using the VGA task trained models. The VQG task was also solved using classification. |
Keywords: | Visual Question Answering; Visual Question Generation; Knowledge Inference; Deep Neural Networks; Skeleton-based Sentence Mapping; Class-wise and Task-wise Normalization |
Description: | Session - ImageCLEF: Multimedia Retrieval in Medicine, Lifelogging, and Internet. |
Rights: | Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). |
Published version: | http://ceur-ws.org/Vol-2696 |
Appears in Collections: | Australian Institute for Machine Learning publications |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
hdl_132209.pdf | Published version | 501.4 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.