Australian Institute for Machine Learning publications
Permanent URI for this collection
Browse
Browsing Australian Institute for Machine Learning publications by Title
Now showing 1 - 20 of 123
Results Per Page
Sort Options
Item Metadata only A Bayesian data augmentation approach for learning deep models(Neural Information Processing Systems Foundation, 2018) Tran, T.; Pham, T.; Carneiro, G.; Palmer, L.; Reid, I.; NIPS Foundation Inc (4 Dec 2017 - 9 Dec 2017 : Long Beach, CA); Guyon, I.; Luxburg, U.V.; Bengio, S.; Wallach, H.; Fergus, R.; Vishwanathan, S.; Garnett, R.Data augmentation is an essential part of the training process applied to deep learning models. The motivation is that a robust training process for deep learning models depends on large annotated datasets, which are expensive to be acquired, stored and processed. Therefore a reasonable alternative is to be able to automatically generate new annotated training samples using a process known as data augmentation. The dominant data augmentation approach in the field assumes that new training samples can be obtained via random geometric or appearance transformations applied to annotated training samples, but this is a strong assumption because it is unclear if this is a reliable generative model for producing new training samples. In this paper, we provide a novel Bayesian formulation to data augmentation, where new annotated training points are treated as missing variables and generated based on the distribution learned from the training set. For learning, we introduce a theoretically sound algorithm --- generalised Monte Carlo expectation maximisation, and demonstrate one possible implementation via an extension of the Generative Adversarial Network (GAN). Classification results on MNIST, CIFAR-10 and CIFAR-100 show the better performance of our proposed method compared to the current dominant data augmentation approach mentioned above --- the results also show that our approach produces better classification results than similar GAN models.Item Metadata only A bidirectional graph neural network for traveling salesman problems on arbitrary symmetric graphs(Elsevier, 2021) Hu, Y.; Zhang, Z.; Yao, Y.; Huyan, X.; Zhou, X.; Lee, W.S.Deep learning has recently been shown to provide great achievement to the traveling salesman problem (TSP) on the Euclidean graphs. These methods usually fully represent the graph by a set of coordinates, and then captures graph information from the coordinates to generate the solution. The TSP on arbitrary symmetric graphs models more realistic applications where the working graphs maybe sparse, or the distance between points on the graphs may not satisfy the triangle inequality. When prior learning-based methods being applied to the TSP on arbitrary symmetric graphs, they are not capable to capture graph features that are beneficial to produce near-optimal solutions. Moreover, they suffer from serious exploration problems. This paper proposes a bidirectional graph neural network (BGNN) for the arbitrary symmetric TSP. The network learns to produce the next city to visit sequentially by imitation learning. The bidirectional message passing layer is designed as the most important component of BGNN. It is able to encode graphs based on edges and partial solutions. By this way, the proposed approach is much possible to construct near-optimal solutions for the TSP on arbitrary symmetric graphs, and it is able to be combined with informed search to further improve performance.Item Open Access A Multi-modal Approach to Fine-grained Opinion Mining on Video Reviews(Association for Computational Linguistics, 2020) Marrese-Taylor, E.; Rodriguez Opazo, C.; Balazs, J.; Gould, S.; Matsuo, Y.; 58th Annual Meeting of the Association for Computational Linguistics (ACL) (5 Jul 2020 - 10 Jul 2020 : Seattle, USA)Despite the recent advances in opinion mining for written reviews, few works have tackled the problem on other sources of reviews. In light of this issue, we propose a multimodal approach for mining fine-grained opinions from video reviews that is able to determine the aspects of the item under review that are being discussed and the sentiment orientation towards them. Our approach works at the sentence level without the need for time annotations and uses features derived from the audio, video and language transcriptions of its contents. We evaluate our approach on two datasets and show that leveraging the video and audio modalities consistently provides increased performance over text-only baselines, providing evidence these extra modalities are key in better understanding video reviews.Item Metadata only A New Higher Order Yang-Mills-Higgs Flow on Riemannian 4-Manifolds(Cambridge University Press, 2022) Saratchandran, H.; Zhang, J.; Zhang, P.Let (M, g) be a closed Riemannian 4-manifold and let E be a vector bundle over M with structure group G, where G is a compact Lie group. We consider a new higher order Yang–Mills–Higgs functional, in which the Higgs field is a section of Ω0(adE). We show that, under suitable conditions, solutions to the gradient flow do not hit any finite time singularities. In the case that E is a line bundle, we are able to use a different blow-up procedure and obtain an improvement of the long-time result of Zhang [‘Gradient flows of higher order Yang–Mills–Higgs functionals’, J. Aust. Math. Soc. 113 (2022), 257–287]. The proof relies on properties of the Green function, which is very different from the previous techniques.Item Metadata only A relaxation method to articulated trajectory reconstruction from monocular image sequence(IEEE, 2014) Li, B.; Dai, Y.; He, M.; Van Den Hengel, A.; 2nd IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP 2014) (9 Jul 2014 - 13 Jul 2014 : Xi'an, China)In this paper, we present a novel method for articulated trajectory reconstruction from a monocular image sequence. We propose a relaxation-based objective function, which utilises both smoothness and geometric constraints, posing articulated trajectory reconstruction as a non-linear optimization problem. The main advantage of this approach is that it remains the re-constructive power of the original algorithm, while improving its robustness to the inevitable noise in the data. Furthermore, we present an effective approach to estimating the parameters of our objective function. Experimental results on the CMU motion capture dataset show that our proposed algorithm is effective.Item Metadata only ABCNet: Real-time scene text spotting with adaptive Bezier-curve network(IEEE, 2020) Liu, Y.; Chen, H.; Shen, C.; He, T.; Jin, L.; Wang, L.; IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (14 Jun 2020 - 19 Jun 2020 : virtual online)Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: character-based and segmentation-based. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for real-time applications. Here we address the problem by proposing the Adaptive Bezier-Curve Network (ABCNet). Our contributions are three-fold: 1) For the first time, we adaptively fit oriented or curved text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on oriented or curved benchmark datasets, namely Total-Text and CTW1500, demonstrate that ABCNet achieves state-of-the-art accuracy, meanwhile significantly improving the speed. In particular, on Total-Text, our realtime version is over 10 times faster than recent state-of-theart methods with a competitive recognition accuracy. Code is available at https:// git.io/AdelaiDet.Item Open Access Accurate tensor completion via adaptive low-rank representation(Institute of Electrical and Electronics Engineers, 2020) Zhang, L.; Wei, W.; Shi, Q.; Shen, C.; van den Hengel, A.; Zhang, Y.Low-rank representation-based approaches that assume low-rank tensors and exploit their low-rank structure with appropriate prior models have underpinned much of the recent progress in tensor completion. However, real tensor data only approximately comply with the low-rank requirement in most cases, viz., the tensor consists of low-rank (e.g., principle part) as well as non-low-rank (e.g., details) structures, which limit the completion accuracy of these approaches. To address this problem, we propose an adaptive low-rank representation model for tensor completion that represents low-rank and non-low-rank structures of a latent tensor separately in a Bayesian framework. Specifically, we reformulate the CANDECOMP/PARAFAC (CP) tensor rank and develop a sparsity-induced prior for the low-rank structure that can be used to determine tensor rank automatically. Then, the non-low-rank structure is modeled using a mixture of Gaussians prior that is shown to be sufficiently flexible and powerful to inform the completion process for a variety of real tensor data. With these two priors, we develop a Bayesian minimum mean-squared error estimate framework for inference. The developed framework can capture the important distinctions between low-rank and non-low-rank structures, thereby enabling more accurate model, and ultimately, completion. For various applications, compared with the state-of-the-art methods, the proposed model yields more accurate completion results.Item Metadata only ACPL: Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification(IEEE, 2022) Liu, F.; Tian, Y.; Chen, Y.; Liu, Y.; Belagiannis, V.; Carneiro, G.; IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (18 Jun 2022 - 24 Jun 2022 : New Orleans, Louisiana)Effective semi-supervised learning (SSL) in medical image analysis (MIA) must address two challenges: 1) work effectively on both multi-class (e.g., lesion classification) and multi-label (e.g., multiple-disease diagnosis) problems, and 2) handle imbalanced learning (because of the high variance in disease prevalence). One strategy to explore in SSL MIA is based on the pseudo labelling strategy, but it has a few shortcomings. Pseudo-labelling has in general lower accuracy than consistency learning, it is not specifically design for both multi-class and multi-label problems, and it can be challenged by imbalanced learning. In this paper, unlike traditional methods that select confident pseudo label by threshold, we propose a new SSL algorithm, called anti-curriculum pseudo-labelling (ACPL), which introduces novel techniques to select informative unlabelled samples, improving training balance and allowing the model to work for both multi-label and multi-class problems, and to estimate pseudo labels by an accurate ensemble of classifiers (improving pseudo label accuracy). We run extensive experiments to evaluate ACPL on two public medical image classification benchmarks: Chest X-Ray14 for thorax disease multi-label classification and ISIC2018 for skin lesion multi-class classification. Our method outperforms previous SOTA SSL methods on both datasets¹².Item Open Access Active learning from noisy tagged images(BMVA Press, 2018) Abbasnejad, M.E.; Dick, A.R.; Shi, Q.; Hengel, A.V.D.; British Machine Vision Conference 2018 (BMVC 2018) (3 Sep 2018 - 6 Sep 2018 : Newcastle upon Tyne)Learning successful image classification models requires large quantities of labelled examples that are generally hard to obtain. On the other hand, the web provides an abundance of loosely labelled images, i.e. tagged in websites such as Flickr. Although these images are cheap and massively available, their tags are typically noisy and unreliable. In an attempt to use such images for training a classifier, we propose a simple probabilistic model to learn a latent semantic space from which deep vector representations of the images and tags are generated. This latent space is subsequently used in an active learning framework based on adaptive submodular optimisation that selects informative images to be labelled. Afterwards, we update the classifier according to the importance of each labelled image to best capture the information they provide. Through this simple approach, we are able to train a classifier that performs well using a fraction of the effort that is typically required for image labelling and classifier training.Item Open Access Active-learning accelerated computational screening of A₂B@NG catalysts for CO₂ electrochemical reduction(Elsevier, 2023) Li, X.; Li, H.; Zhang, Z.; Shi, J.Q.; Jiao, Y.; Qiao, S.-Z.Few-atom catalysts, due to the unique coordination structure compared to metal particles and single-atom catalysts, have the potential to be applied for efficient electrochemical CO2 reduction (CRR). In this study, we designed a class of triple-atom A2B catalysts, with two A metal atoms and one B metal atom either horizontally or vertically embedded in the nitrogen-doped graphene plane. Metals A and B were selected from 17 elements across 3d to 5d transition metals. The structural stability and CRR activity of the 257 constructed A2B catalysts were evaluated. The active-learning approach was applied to predict the adsorption site of key reaction intermediate *CO, which only used 40% computing resources in comparison to “brute force” calculation and greatly accelerated the large amount of computation brought by the large number of A2B catalysts. Our results reveal that these triple atom catalysts can selectively produce more valuable hydrocarbon products while preserving high reactivity. Additionally, six triple-atom catalysts were proposed as potential CRR catalysts. These findings provide a theoretical understanding of the experimentally synthesized Fe3 and Ru3-N4 catalysts and lay a foundation for future discovery of few-atom catalysts and carbon materials in other applications. A new machine learning method, masked energy model, was also proposed which outperforms existing methods by approximately 5% when predicting low-coverage adsorption sites.Item Metadata only Actively seeking and learning from live data(Computer Vision Foundation / IEEE, 2019) Teney, D.; Hengel, A.V.D.; IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (15 Jun 2019 - 20 Jun 2019 : Long Beach, USA)One of the key limitations of traditional machine learning methods is their requirement for training data that exemplifies all the information to be learned. This is a particular problem for visual question answering methods, which may be asked questions about virtually anything. The approach we propose is a step toward overcoming this limitation by searching for the information required at test time. The resulting method dynamically utilizes data from an external source, such as a large set of questions/answers or images/captions. Concretely, we learn a set of base weights for a simple VQA model, that are specifically adapted to a given question with the information specifically retrieved for this question. The adaptation process leverages recent advances in gradient-based meta learning and contributions for efficient retrieval and cross-domain adaptation. We surpass the state-of-the-art on the VQACP v2 benchmark and demonstrate our approach to be intrinsically more robust to out-of-distribution test data. We demonstrate the use of external non-VQA data using the MS COCO captioning dataset to support the answering process. This approach opens a new avenue for open-domain VQA systems that interface with diverse sources of data.Item Open Access Adaptive importance learning for improving lightweight image super-resolution network(Springer, 2020) Zhang, L.; Wang, P.; Shen, C.; Liu, L.; Wei, W.; Zhang, Y.; van den Hengel, A.Deep neural networks have achieved remarkable success in single image super-resolution (SISR). The computing and memory requirements of these methods have hindered their application to broad classes of real devices with limited computing power, however. One approach to this problem has been lightweight network architectures that balance the super-resolution performance and the computation burden. In this study, we revisit this problem from an orthogonal view, and propose a novel learning strategy to maximize the pixel-wise fitting ability of a given lightweight network architecture. Considering that the initial performance of the lightweight network is very limited, we present an adaptive importance learning scheme for SISR that trains the network with an easy-to-complex paradigm by dynamically updating the importance of image pixels on the basis of the training loss. Specifically, we formulate the network training and the importance learning into a joint optimization problem. With a carefully designed importance penalty function, the importance of individual pixels can be gradually increased through solving a convex optimization problem. The training process thus begins with pixels that are easy to reconstruct, and gradually proceeds to more complex pixels as fitting improves. Furthermore, the proposed learning scheme is able to seamlessly assimilate knowledge from a more powerful teacher network in the form of importance initialization, thus obtaining better initial performance for the network. Through learning the network parameters, and updating pixel importance, the proposed learning scheme enables smaller, lightweight, networks to achieve better performance than has previously been possible. Extensive experiments on four benchmark datasets demonstrate the potential benefits of the proposed learning strategy in lightweight SISR network enhancement. In some cases, our learned network with only 25% of the parameters and computational complexity can produce comparable or even better results than the corresponding full-parameter network.Item Metadata only Adaptive neuro-surrogate-based optimisation method for wave energy converters placement optimisation(Springer Nature, 2019) Neshat, M.; Abbasnejad, E.; Shi, Q.; Alexander, B.; Wagner, M.; 26th International Conference on Neural Information Processing (ICONIP) (12 Dec 2019 - 15 Dec 2019 : Sydney, Australia); Gedeon, T.; Wong, K.W.; Lee, M.Installed renewable energy capacity has expanded massively in recent years. Wave energy, with its high capacity factors, has great potential to complement established sources of solar and wind energy. This study explores the problem of optimising the layout of advanced, three-tether wave energy converters in a size-constrained farm in a numerically modelled ocean environment. Simulating and computing the complicated hydrodynamic interactions in wave farms can be computationally costly, which limits optimisation methods to using just a few thousand evaluations. For dealing with this expensive optimisation problem, an adaptive neuro-surrogate optimisation (ANSO) method is proposed that consists of a surrogate Recurrent Neural Network (RNN) model trained with a very limited number of observations. This model is coupled with a fast meta-heuristic optimiser for adjusting the model’s hyper-parameters. The trained model is applied using a greedy local search with a backtracking optimisation strategy. For evaluating the performance of the proposed approach, some of the more popular and successful Evolutionary Algorithms (EAs) are compared in four real wave scenarios (Sydney, Perth, Adelaide and Tasmania). Experimental results show that the adaptive neuro model is competitive with other optimisation methods in terms of total harnessed power output and faster in terms of total computational costs.Item Metadata only AffordanceNet: an end-to-end deep learning approach for object affordance detection(IEEE, 2018) Do, T.; Nguyen, A.; Reid, I.; IEEE International Conference on Robotics and Automation (ICRA) (21 May 2018 - 25 May 2018 : Brisbane, Australia)We propose AffordanceNet, a new deep learning approach to simultaneously detect multiple objects and their affordances from RGB images. Our AffordanceNet has two branches: an object detection branch to localize and classify the object, and an affordance detection branch to assign each pixel in the object to its most probable affordance label. The proposed framework employs three key components for effectively handling the multiclass problem in the affordance mask: a sequence of deconvolutional layers, a robust resizing strategy, and a multi-task loss function. The experimental results on the public datasets show that our AffordanceNet outperforms recent state-of-the-art methods by a fair margin, while its end-to-end architecture allows the inference at the speed of 150ms per image. This makes our AffordanceNet well suitable for real-time robotic applications. Furthermore, we demonstrate the effectiveness of AffordanceNet in different testing environments and in real robotic applications. The source code is available at https://github.com/nqanh/affordance-net.Item Open Access AIML at VQA-Med 2020: Knowledge inference via a skeleton-based sentence mapping approach for medical domain visual question answering(CEUR-WS, 2020) Liao, Z.; Wu, Q.; Shen, C.; Van Den Hengel, A.; Verjans, J.; International Conference of the CLEF Initiative (CLEF) (22 Sep 2020 - 25 Sep 2020 : virtual online); Cappellato, L.; Eickhoff, C.; Ferro, N.; Névéol, A.In this paper, we describe our contribution to the 2020 ImageCLEF Medical Domain Visual Question Answering (VQA-Med) challenge. Our submissions scored first place on the VQA challenge leaderboard, and also the first place on the associated Visual Question Generation (VQG) challenge leaderboard. Our VQA approach was developed using a knowledge inference methodology called Skeleton-based Sentence Mapping (SSM). Using all the questions and answers, we derived a set of classifiable tasks and inferred the corresponding labels. As a result, we were able to transform the VQA task into a multi-task image classification problem which allowed us to focus on the image modelling aspect. We further propose a class-wise and task-wise normalization facilitating optimization of multiple tasks in a single network. This enabled us to apply a multi-scale and multi-architecture ensemble strategy for robust prediction. Lastly, we positioned the VQG task as a transfer learning problem using the VGA task trained models. The VQG task was also solved using classification.Item Metadata only An embarrassingly simple approach to visual domain adaptation(IEEE, 2018) Lu, H.; Shen, C.; Cao, Z.; Xiao, Y.; Van Den Hengel, A.We show that it is possible to achieve high-quality domain adaptation without explicit adaptation. The nature of the classification problem means that when samples from the same class in different domains are sufficiently close, and samples from differing classes are separated by large enough margins, there is a high probability that each will be classified correctly. Inspired by this, we propose an embarrassingly simple yet effective approach to domain adaptation-only the class mean is used to learn class-specific linear projections. Learning these projections is naturally cast into a linear-discriminant-analysis-like framework, which gives an efficient, closed form solution. Furthermore, to enable to application of this approach to unsupervised learning, an iterative validation strategy is developed to infer target labels. Extensive experiments on cross-domain visual recognition demonstrate that, even with the simplest formulation, our approach outperforms existing non-deep adaptation methods and exhibits classification performance comparable with that of modern deep adaptation methods. An analysis of potential issues effecting the practical application of the method is also described, including robustness, convergence, and the impact of small sample sizes.Item Metadata only Approximate Fisher information matrix to characterise the training of deep neural networks(IEEE, 2020) Liao, Z.; Drummond, T.; Reid, I.; Carneiro, G.In this paper, we introduce a novel methodology for characterising the performance of deep learning networks (ResNets and DenseNet) with respect to training convergence and generalisation as a function of mini-batch size and learning rate for image classification. This methodology is based on novel measurements derived from the eigenvalues of the approximate Fisher information matrix, which can be efficiently computed even for high capacity deep models. Our proposed measurements can help practitioners to monitor and control the training process (by actively tuning the mini-batch size and learning rate) to allow for good training convergence and generalisation. Furthermore, the proposed measurements also allow us to show that it is possible to optimise the training process with a new dynamic sampling training approach that continuously and automatically change the mini-batch size and learning rate during the training process. Finally, we show that the proposed dynamic sampling training approach has a faster training time and a competitive classification accuracy compared to the current state of the art.Item Metadata only Attention-based network for low-light image enhancement(IEEE, 2020) Zhang, C.; Yan, Q.; Zhu, Y.; Li, X.; Sun, J.; Zhang, Y.; IEEE International Conference on Multimedia and Expo (ICME) (6 Jul 2020 - 10 Jul 2020 : virtual online)The captured images under low-light conditions often suffer insufficient brightness and notorious noise. Hence, low-light image enhancement is a key challenging task in computer vision. A variety of methods have been proposed for this task, but these methods often failed in an extreme low-light environment and amplified the underlying noise in the input image. To address such a difficult problem, this paper presents a novel attention-based neural network to generate high-quality enhanced low-light images from the raw sensor data. Specifically, we first employ attention strategy (i.e. spatial attention and channel attention modules) to suppress undesired chromatic aberration and noise. The spatial attention module focuses on denoising by taking advantage of the non-local correlation in the image. The channel attention module guides the network to refine redundant colour features. Furthermore, we propose a new pooling layer, called inverted shuffle layer, which adaptively selects useful information from previous features. Extensive experiments demonstrate the superiority of the proposed network in terms of suppressing the chromatic aberration and noise artifacts in enhancement, especially when the low-light image has severe noise.Item Metadata only Bayesian semantic instance segmentation in open set world(Springer, 2018) Pham, T.; Vijay Kumar, B.; Do, T.; Carneiro, G.; Reid, I.; European Conference on Computer Vision (ECCV) (8 Sep 2018 - 14 Sep 2018 : Munich); Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y.This paper addresses the semantic instance segmentation task in the open-set conditions, where input images can contain known and unknown object classes. The training process of existing semantic instance segmentation methods requires annotation masks for all object instances, which is expensive to acquire or even infeasible in some realistic scenarios, where the number of categories may increase boundlessly. In this paper, we present a novel open-set semantic instance segmentation approach capable of segmenting all known and unknown object classes in images, based on the output of an object detector trained on known object classes. We formulate the problem using a Bayesian framework, where the posterior distribution is approximated with a simulated annealing optimization equipped with an efficient image partition sampler. We show empirically that our method is competitive with state-of-the-art supervised methods on known classes, but also performs well on unknown classes when compared with unsupervised methods.Item Metadata only Blindly assess image quality in the wild guided by a self-adaptive hyper network(IEEE, 2020) Su, S.; Yan, Q.; Zhu, Y.; Zhang, C.; Ge, X.; Sun, J.; Zhang, Y.; IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (14 Jun 2020 - 19 Jun 2020 : virtual online)Blind image quality assessment (BIQA) for authentically distorted images has always been a challenging problem, since images captured in the wild include varies contents and diverse types of distortions. The vast majority of prior BIQA methods focus on how to predict synthetic image quality, but fail when applied to real-world distorted images. To deal with the challenge, we propose a self-adaptive hyper network architecture to blind assess image quality in the wild. We separate the IQA procedure into three stages including content understanding, perception rule learning and quality predicting. After extracting image semantics, perception rule is established adaptively by a hyper network, and then adopted by a quality prediction network. In our model, image quality can be estimated in a self-adaptive manner, thus generalizes well on diverse images captured in the wild. Experimental results verify that our approach not only outperforms the state-of-the-art methods on challenging authentic image databases but also achieves competing performances on synthetic image databases, though it is not explicitly designed for the synthetic task.