Australian Institute for Machine Learning
Permanent URI for this community
Machine learning underpins the business models of the largest corporations and has the potential to deliver massive, social, economic and environmental benefits. Our world-class research strengths lie in machine learning and the methods that support this; artificial intelligence, computer vision and deep learning.
Browse
Browsing Australian Institute for Machine Learning by Author "15th European Conference on Computer Vision (ECCV 2018) (8 Sep 2018 - 14 Sep 2018 : Munich)"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
Item Metadata only Goal-oriented visual question generation via intermediate rewards(Springer, 2018) Zhang, J.; Wu, Q.; Shen, C.; Zhang, J.; Lu, J.; van den Hengel, A.; 15th European Conference on Computer Vision (ECCV 2018) (8 Sep 2018 - 14 Sep 2018 : Munich); Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y.Despite significant progress in a variety of vision-and-language problems, developing a method capable of asking intelligent, goal-oriented questions about images is proven to be an inscrutable challenge. Towards this end, we propose a Deep Reinforcement Learning framework based on three new intermediate rewards, namely goal-achieved, progressive and informativeness that encourage the generation of succinct questions, which in turn uncover valuable information towards the overall goal. By directly optimizing for questions that work quickly towards fulfilling the overall goal, we avoid the tendency of existing methods to generate long series of inane queries that add little value. We evaluate our model on the GuessWhat?! dataset and show that the resulting questions can help a standard ‘Guesser’ identify a specific object in an image at a much higher success rate.Item Metadata only Multi-modal cycle-consistent generalized zero-shot learning(Springer, 2018) Felix, R.; Vijay Kumar, B.; Reid, I.; Carneiro, G.; 15th European Conference on Computer Vision (ECCV 2018) (8 Sep 2018 - 14 Sep 2018 : Munich); Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y.In generalized zero shot learning (GZSL), the set of classes are split into seen and unseen classes, where training relies on the semantic features of the seen and unseen classes and the visual representations of only the seen classes, while testing uses the visual representations of the seen and unseen classes. Current methods address GZSL by learning a transformation from the visual to the semantic space, exploring the assumption that the distribution of classes in the semantic and visual spaces is relatively similar. Such methods tend to transform unseen testing visual representations into one of the seen classes’ semantic features instead of the semantic features of the correct unseen class, resulting in low accuracy GZSL classification. Recently, generative adversarial networks (GAN) have been explored to synthesize visual representations of the unseen classes from their semantic features - the synthesized representations of the seen and unseen classes are then used to train the GZSL classifier. This approach has been shown to boost GZSL classification accuracy, but there is one important missing constraint: there is no guarantee that synthetic visual representations can generate back their semantic feature in a multi-modal cycle-consistent manner. This missing constraint can result in synthetic visual representations that do not represent well their semantic features, which means that the use of this constraint can improve GAN-based approaches. In this paper, we propose the use of such constraint based on a new regularization for the GAN training that forces the generated visual features to reconstruct their original semantic features. Once our model is trained with this multi-modal cycle-consistent semantic compatibility, we can then synthesize more representative visual representations for the seen and, more importantly, for the unseen classes. Our proposed approach shows the best GZSL classification results in the field in several publicly available datasets.Item Metadata only Visual Question Answering as a meta learning task(Springer, 2018) Teney, D.; Van Den Hengel, A.; 15th European Conference on Computer Vision (ECCV 2018) (8 Sep 2018 - 14 Sep 2018 : Munich); Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y.The predominant approach to Visual Question Answering (VQA) demands that the model represents within its weights all of the information required to answer any question about any image. Learning this information from any real training set seems unlikely, and representing it in a reasonable number of weights doubly so. We propose instead to approach VQA as a meta learning task, thus separating the question answering method from the information required. At test time, the method is provided with a support set of example questions/answers, over which it reasons to resolve the given question. The support set is not fixed and can be extended without retraining, thereby expanding the capabilities of the model. To exploit this dynamically provided information, we adapt a state-of-the-art VQA model with two techniques from the recent meta learning literature, namely prototypical networks and meta networks. Experiments demonstrate the capability of the system to learn to produce completely novel answers (i.e. never seen during training) from examples provided at test time. In comparison to the existing state of the art, the proposed method produces qualitatively distinct results with higher recall of rare answers, and a better sample efficiency that allows training with little initial data. More importantly, it represents an important step towards vision-and-language methods that can learn and reason on-the-fly.