Australian Institute for Machine Learning publications

Permanent URI for this collection

https://hdl.handle.net/2440/115881

Browse

Now showing 1 - 20 of 123

Open Access
Discovery of Graphene Growth Alloy Catalysts Using High-Throughput Machine Learning
(American Chemical Society, 2023) Li, X.; Shi, J.Q.; Page, A.J.
Despite today’s commercial-scale graphene production using chemical vapor deposition (CVD), the growth of high-quality single-layer graphene with controlled morphology and crystallinity remains challenging. Considerable effort is still spent on designing improved CVD catalysts for producing high-quality graphene. Conventionally, however, catalyst design has been pursued using empirical intuition or trial-and-error approaches. Here, we combine high-throughput density functional theory and machine learning to identify new prospective transition metal alloy catalysts that exhibit performance comparable to that of established graphene catalysts, such as Ni(111) and Cu(111). The alloys identified through this process generally consist of combinations of early- and late-transition metals, and a majority are alloys of Ni or Cu. Nevertheless, in many cases, these conventional catalyst metals are combined with unconventional partners, such as Zr, Hf, and Nb. The approach presented here therefore highlights an important new approach for identifying novel catalyst materials for the CVD growth of low-dimensional nanomaterials.
Open Access
Predicting progression of Parkinson’s disease motor outcomes using a multimodal combination of baseline clinical measures, neuroimaging and biofluid markers
(IOS Press, 2023) McNamara, A.; Ellul, B.; Baetu, I.-I.; Lau, S.; Jenkinson, M.; Collins-Praino, L.; 6th World Parkinson Congress (4 Jul 2023 - 7 Jul 2023 : Barcelona, Spain)
Abstract not available
Open Access
Active-learning accelerated computational screening of A₂B@NG catalysts for CO₂ electrochemical reduction
(Elsevier, 2023) Li, X.; Li, H.; Zhang, Z.; Shi, J.Q.; Jiao, Y.; Qiao, S.-Z.
Few-atom catalysts, due to the unique coordination structure compared to metal particles and single-atom catalysts, have the potential to be applied for efficient electrochemical CO2 reduction (CRR). In this study, we designed a class of triple-atom A2B catalysts, with two A metal atoms and one B metal atom either horizontally or vertically embedded in the nitrogen-doped graphene plane. Metals A and B were selected from 17 elements across 3d to 5d transition metals. The structural stability and CRR activity of the 257 constructed A2B catalysts were evaluated. The active-learning approach was applied to predict the adsorption site of key reaction intermediate *CO, which only used 40% computing resources in comparison to “brute force” calculation and greatly accelerated the large amount of computation brought by the large number of A2B catalysts. Our results reveal that these triple atom catalysts can selectively produce more valuable hydrocarbon products while preserving high reactivity. Additionally, six triple-atom catalysts were proposed as potential CRR catalysts. These findings provide a theoretical understanding of the experimentally synthesized Fe3 and Ru3-N4 catalysts and lay a foundation for future discovery of few-atom catalysts and carbon materials in other applications. A new machine learning method, masked energy model, was also proposed which outperforms existing methods by approximately 5% when predicting low-coverage adsorption sites.
Open Access
Visual Place Recognition: A Tutorial
(Institute of Electrical and Electronics Engineers (IEEE), 2024) Schubert, S.; Neubert, P.; Garg, S.; Milford, M.; Fischer, T.
Localization is an essential capability for mobile robots, enabling them to build a comprehensive representation of their environment and interact with the environment effectively toward a goal. A rapidly growing field of research in this area is visual place recognition (VPR), which is the ability to recognize previously seen places in the world based solely on images.
Metadata only
Wind turbine power output prediction using a new hybrid neuro-evolutionary method
(Elsevier, 2021) Neshat, M.; Nezhad, M.M.; Abbasnejad, E.; Mirjalili, S.; Groppi, D.; Heydari, A.; Tjernberg, L.B.; Astiaso Garcia, D.; Alexander, B.; Shi, Q.; Wagner, M.
Abstract not available
Metadata only
Multi-Modal Learning With Missing Modality via Shared-Specific Feature Modelling
(IEEE, 2023) Wang, H.; Chen, Y.; Ma, C.; Avery, J.C.; Hull, M.L.; Carneiro, G.; IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (17 Jun 2023 - 24 Jun 2023 : Vancouver, Canada)
The missing modality issue is critical but non-trivial to be solved by multi-modal models. Current methods aiming to handle the missing modality problem in multi-modal tasks, either deal with missing modalities only during evaluation or train separate models to handle specific missing modality settings. In addition, these models are designed for specific tasks, so for example, classification models are not easily adapted to segmentation tasks and vice versa. In this paper, we propose the Shared-Specific Feature Modelling (ShaSpec) method that is considerably simpler and more effective than competing approaches that address the issues above. ShaSpec is designed to take advantage of all available input modalities during training and evaluation by learning shared and specific features to better represent the input data. This is achieved from a strategy that relies on auxiliary tasks based on distribution alignment and domain classification, in addition to a residual feature fusion procedure. Also, the design simplicity of ShaSpec enables its easy adaptation to multiple tasks, such as classification and segmentation. Experiments are conducted on both medical image segmentation and computer vision classification, with results indicating that ShaSpec outperforms competing methods by a large margin. For instance, on BraTS2018, ShaSpec improves the SOTA by more than 3% for enhancing tumour, 5% for tumour core and 3% for whole tumour.
Metadata only
Self-supervised pseudo multi-class pre-training for unsupervised anomaly detection and segmentation in medical images
(Elsevier BV, 2023) Tian, Y.; Liu, F.; Pang, G.; Chen, Y.; Liu, Y.; Verjans, J.W.; Singh, R.; Carneiro, G.
Unsupervised anomaly detection (UAD) methods are trained with normal (or healthy) images only, but during testing, they are able to classify normal and abnormal (or disease) images. UAD is an important medical image analysis (MIA) method to be applied in disease screening problems because the training sets available for those problems usually contain only normal images. However, the exclusive reliance on normal images may result in the learning of ineffective low-dimensional image representations that are not sensitive enough to detect and segment unseen abnormal lesions of varying size, appearance, and shape. Pre-training UAD methods with self-supervised learning, based on computer vision techniques, can mitigate this challenge, but they are suboptimal because they do not explore domain knowledge for designing the pretext tasks, and their contrastive learning losses do not try to cluster the normal training images, which may result in a sparse distribution of normal images that is ineffective for anomaly detection. In this paper, we propose a new self-supervised pre-training method for MIA UAD applications, named Pseudo Multi-class Strong Augmentation via Contrastive Learning (PMSACL). PMSACL consists of a novel optimisation method that contrasts a normal image class from multiple pseudo classes of synthesised abnormal images, with each class enforced to form a dense cluster in the feature space. In the experiments, we show that our PMSACL pre-training improves the accuracy of SOTA UAD methods on many MIA benchmarks using colonoscopy, fundus screening and Covid-19 Chest X-ray datasets.
Metadata only
Hyperdimensional Feature Fusion for Out-of-Distribution Detection
(IEEE, 2023) Wilson, S.; Fischer, T.; Sunderhauf, N.; Dayoub, F.; IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (3 Jan 2023 - 7 Jan 2023 : Waikoloa, Hawaii)
We introduce powerful ideas from Hyperdimensional Computing into the challenging field of Out-of-Distribution (OOD) detection. In contrast to most existing works that perform OOD detection based on only a single layer of a neural network, we use similarity-preserving semi-orthogonal projection matrices to project the feature maps from multiple layers into a common vector space. By repeatedly applying the bundling operation ⊕, we create expressive class-specific descriptor vectors for all in-distribution classes. At test time, a simple and efficient cosine similarity calculation between descriptor vectors consistently identifies OOD samples with competitive performance to the current state-of-the-art whilst being significantly faster. We show that our method is orthogonal to recent state-of-the-art OOD detectors and can be combined with them to further improve upon the performance.
Metadata only
Instance-Dependent Noisy Label Learning via Graphical Modelling
(IEEE, 2023) Garg, A.; Nguyen, C.; Felix, R.; Do, T.-T.; Carneiro, G.; IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) (2 Jan 2023 - 7 Jan 2023 : Waikoloa, HI, USA)
Noisy labels are unavoidable yet troublesome in the ecosystem of deep learning because models can easily overfit them. There are many types of label noise, such as symmetric, asymmetric and instance-dependent noise (IDN), with IDN being the only type that depends on image information. Such dependence on image information makes IDN a critical type of label noise to study, given that labelling mistakes are caused in large part by insufficient or ambiguous information about the visual classes present in images. Aiming to provide an effective technique to address IDN, we present a new graphical modelling approach called InstanceGM, that combines discriminative and generative models. The main contributions of InstanceGM are: i) the use of the continuous Bernoulli distribution to train the generative model, offering significant training advantages, and ii) the exploration of a state-of-the-art noisy-label discriminative classifier to generate clean labels from instancedependent noisy-label samples. InstanceGM is competitive with current noisy-label learning approaches, particularly in IDN benchmarks using synthetic and real-world datasets, where our method shows better accuracy than the competitors in most experiments¹.
Metadata only
HOP+: History-Enhanced and Order-Aware Pre-Training for Vision-and-Language Navigation
(Institute of Electrical and Electronics Engineers (IEEE), 2023) Qiao, Y.; Qi, Y.; Hong, Y.; Yu, Z.; Wang, P.; Wu, Q.
Recent works attempt to employ pre-training in Vision-and-Language Navigation (VLN). However, these methods neglect the importance of historical contexts or ignore predicting future actions during pre-training, limiting the learning of visual-textual correspondence and the capability of decision-making. To address these problems, we present a history-enhanced and order-aware pre-training with the complementing fine-tuning paradigm (HOP+) for VLN. Specifically, besides the common Masked Language Modeling (MLM) and Trajectory-Instruction Matching (TIM) tasks, we design three novel VLN-specific proxy tasks: Action Prediction with History (APH) task, Trajectory Order Modeling (TOM) task and Group Order Modeling (GOM) task. APH task takes into account the visual perception trajectory to enhance the learning of historical knowledge as well as action prediction. The two temporal visualtextual alignment tasks, TOM and GOM further improve the agent’s ability to order reasoning. Moreover, we design a memory network to address the representation inconsistency of history context between the pre-training and the fine-tuning stages. The memory network effectively selects and summarizes historical information for action prediction during fine-tuning, without costing huge extra computation consumption for downstream VLN tasks. HOP+ achieves new state-of-the-art performance on four downstream VLN tasks (R2R, REVERIE, RxR, and NDH), which demonstrates the effectiveness of our proposed method.
Metadata only
Proposal-free temporal moment localization of a natural-language query in video using guided attention
(IEEE, 2020) Rodriguez Opazo, C.; Marrese-Taylor, E.; Saleh, F.S.; Li, H.; Gould, S.; IEEE Winter Conference on Applications of Computer Vision (WACV) (1 Mar 2020 - 5 Mar 2020 : Snowmass, CO, USA)
This paper studies the problem of temporal moment localization in a long untrimmed video using natural language as the query. Given an untrimmed video and a query sentence, the goal is to determine the start and end of the relevant visual moment in the video that corresponds to the query sentence. While most previous works have tackled this by a propose-and-rank approach, we introduce a more efficient, end-to-end trainable, and proposal-free approach that is built upon three key components: a dynamic filter which adaptively transfers language information to visual domain attention map, a new loss function to guide the model to attend the most relevant part of the video, and soft labels to cope with annotation uncertainties. Our method is evaluated on three standard benchmark datasets, Charades-STA, TACoS and ActivityNet-Captions. Experimental results show our method outperforms state-of-theart methods on these datasets, confirming the effectiveness of the method. We believe the proposed dynamic filter-based guided attention mechanism will prove valuable for other vision and language tasks as well.
Metadata only
Realisations of elliptic operators on compact manifolds with boundary
(Elsevier BV, 2023) Bandara, L.; Goffeng, M.; Saratchandran, H.
This paper investigates realisations of elliptic differential operators of general order on manifolds with boundary following the approach of Bär-Ballmann to first order elliptic operators. The space of possible boundary values of elements in the maximal domain is described as a Hilbert space densely sandwiched between two mixed order Sobolev spaces. The description uses Calderón projectors which, in the first order case, is equivalent to results of Bär-Bandara using spectral projectors of an adapted boundary operator. Boundary conditions that induce Fredholm as well as regular realisations, and those that admit higher order regularity, are characterised. In addition, results concerning spectral theory, homotopy invariance of the Fredholm index, and well-posedness for higher order elliptic boundary value problems are proven.
Open Access
Hip osteoarthritis: A novel network analysis of subchondral trabecular bone structures
(Oxford University Press (OUP), 2022) Dorraki, M.; Muratovic, D.; Fouladzadeh, A.; Verjans, J.W.; Allison, A.; Findlay, D.M.; Abbott, D.; Yooseph, S.
Hip osteoarthritis (HOA) is a degenerative joint disease that leads to the progressive destruction of subchondral bone and cartilage at the hip joint. Development of effective treatments for HOA remains an open problem, primarily due to the lack of knowledge of its pathogenesis and a typically late-stage diagnosis.We describe a novel network analysis methodology for microcomputed tomography (micro-CT) images of human trabecular bone.We explored differences between the trabecular bone microstructure of femoral heads with and without HOA. Large-scale automated extraction of the network formed by trabecular bone revealed significant network properties not previously reported for bone. Profound differences were discovered, particularly in the proximal third of the femoral head, where HOA networks demonstrated elevated numbers of edges, vertices, and graph components. When further differentiating healthy joint and HOA networks, the latter showed fewer small-world network properties, due to decreased clustering coefficient and increased characteristic path length. Furthermore,we found that HOA networks had reduced length of edges, indicating the formation of compressed trabecular structures. In order to assess our network approach,we developed a deep learningmodel for classifying HOA and control cases, and we fed it with two separate inputs: (i) micro-CT images of the trabecular bone, and (ii) the network extracted from them. The model with plain micro-CT images achieves 74.6% overall accuracy while the trained model with extracted networks attains 96.5% accuracy. We anticipate our findings to be a starting point for a novel description of bone microstructure in HOA, by considering the phenomenon from a graph theory viewpoint.
Metadata only
Improving Worst Case Visual Localization Coverage via Place-Specific Sub-Selection in Multi-Camera Systems
(Institute of Electrical and Electronics Engineers, 2022) Hausler, S.; Xu, M.; Garg, S.; Chakravarty, P.; Shrivastava, S.; Vora, A.; Milford, M.
6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map. Current techniques use hierarchical pipelines and learned 2D feature extractors to improve scalability and increase performance. However, despite gains in typical recall@0.25mtype metrics, these systems still have limited utility for real-world applications like autonomous vehicles because of their worst areas of performance - the locations where they provide insufficient recall at a certain required error tolerance. Here we investigate the utility of using place specific configurations, where a map is segmented into a number of places, each with its own configuration for modulating the pose estimation step, in this case selecting a camera within a multi-camera system. On the Ford AV benchmark dataset, we demonstrate substantially improved worst-case localization performance compared to using off-the-shelf pipelines - minimizing the percentage of the dataset which has low recall at a certain error tolerance, as well as improved overall localization performance. Our proposed approach is particularly applicable to the crowdsharingmodel of autonomous vehicle deployment, where a fleet of AVs are regularly traversing a known route.
Metadata only
Semantic–geometric visual place recognition: a new perspective for reconciling opposing views
(SAGE Publications, 2022) Garg, S.; Suenderhauf, N.; Milford, M.
Human drivers are capable of recognizing places from a previous journey even when viewing them from the opposite direction during the return trip under radically different environmental conditions, without needing to look back or employ a [Formula: see text] camera or LIDAR sensor. Such navigation capabilities are attributed in large part to the robust semantic scene understanding capabilities of humans. However, for an autonomous robot or vehicle, achieving such human-like visual place recognition capability presents three major challenges: (1) dealing with a limited amount of commonly observable visual content when viewing the same place from the opposite direction; (2) dealing with significant lateral viewpoint changes caused by opposing directions of travel taking place on opposite sides of the road; and (3) dealing with a radically changed scene appearance due to environmental conditions such as time of day, season, and weather. Current state-of-the-art place recognition systems have only addressed these three challenges in isolation or in pairs, typically relying on appearance-based, deep-learnt place representations. In this paper, we present a novel, semantics-based system that for the first time solves all three challenges simultaneously. We propose a hybrid image descriptor that semantically aggregates salient visual information, complemented by appearance-based description, and augment a conventional coarse-to-fine recognition pipeline with keypoint correspondences extracted from within the convolutional feature maps of a pre-trained network. Finally, we introduce descriptor normalization and local score enhancement strategies for improving the robustness of the system. Using both existing benchmark datasets and extensive new datasets that for the first time combine the three challenges of opposing viewpoints, lateral viewpoint shifts, and extreme appearance change, we show that our system can achieve practical place recognition performance where existing state-of-the-art methods fail.
Metadata only
VPR-Bench: An Open-Source Visual Place Recognition Evaluation Framework with Quantifiable Viewpoint and Appearance Change
(Springer Science and Business Media LLC, 2021) Zaffar, M.; Garg, S.; Milford, M.; Kooij, J.; Flynn, D.; McDonald-Maier, K.; Ehsan, S.
Visual place recognition (VPR) is the process of recognising a previously visited place using visual information, often under varying appearance conditions and viewpoint changes and with computational constraints. VPR is related to the concepts of localisation, loop closure, image retrieval and is a critical component of many autonomous navigation systems ranging from autonomous vehicles to drones and computer vision systems. While the concept of place recognition has been around for many years, VPR research has grown rapidly as a field over the past decade due to improving camera hardware and its potential for deep learning-based techniques, and has become a widely studied topic in both the computer vision and robotics communities. This growth however has led to fragmentation and a lack of standardisation in the field, especially concerning performance evaluation. Moreover, the notion of viewpoint and illumination invariance of VPR techniques has largely been assessed qualitatively and hence ambiguously in the past. In this paper, we address these gaps through a new comprehensive open-source framework for assessing the performance of VPR techniques, dubbed “VPR-Bench”. VPR-Bench (Open-sourced at: https://github.com/MubarizZaffar/VPR-Bench) introduces two much-needed capabilities for VPR researchers: firstly, it contains a benchmark of 12 fully-integrated datasets and 10 VPR techniques, and secondly, it integrates a comprehensive variation-quantified dataset for quantifying viewpoint and illumination invariance. We apply and analyse popular evaluation metrics for VPR from both the computer vision and robotics communities, and discuss howthese different metrics complement and/or replace each other, depending upon the underlying applications and system requirements. Our analysis reveals that no universal SOTA VPR technique exists, since: (a) state-of-the-art (SOTA) performance is achieved by 8 out of the 10 techniques on at least one dataset, (b) SOTA technique in one community does not necessarily yield SOTA performance in the other given the differences in datasets and metrics. Furthermore, we identify key open challenges since: (c) all 10 techniques suffer greatly in perceptually-aliased and less-structured environments, (d) all techniques suffer from viewpoint variance where lateral change has less effect than 3D change, and (e) directional illumination change has more adverse effects on matching confidence than uniform illumination change. We also present detailed meta-analyses regarding the roles of varying ground-truths, platforms, application requirements and technique parameters. Finally, VPR-Bench provides a unified implementation to deploy these VPR techniques, metrics and datasets, and is extensible through templates.
Metadata only
OpenSeqSLAM2.0: An Open Source Toolbox for Visual Place Recognition under Changing Conditions
(IEEE, 2019) Talbot, B.; Garg, S.; Milford, M.; IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (1 Oct 2018 - 5 Oct 2018 : Madrid, Spain); Maciejewski, A.A.; Okamura, A.; Bicchi, A.; Stachniss, C.; Song, D.Z.; Lee, D.H.; Chaumette, F.; Ding, H.; Li, J.S.; Wen, J.; Roberts, J.; Masamune, K.; Chong, N.Y.; Amato, N.; Tsagwarakis, N.; Rocco, P.; Asfour, T.; Chung, W.K.; Yasuyoshi, Y.; Sun, Y.; Maciekeski, T.; Althoefer, K.; AndradeCetto, J.; Chung, W.K.; Demircan, E.; Dias, J.; Fraisse, P.; Gross, R.; Harada, H.; Hasegawa, Y.; Hayashibe, M.; Kiguchi, K.; Kim, K.; Kroeger, T.; Li, Y.; Ma, S.; Mochiyama, H.; Monje, C.A.; Rekleitis, I.; Roberts, R.; Stulp, F.; Tsai, C.H.D.; Zollo, L.
Visually recognising a traversed route — regardless of whether seen during the day or night, in clear or inclement conditions, or in summer or winter — is an important capability for navigating robots. Since SeqSLAM was introduced in 2012, a large body of work has followed exploring how robotic systems can use the algorithm to meet the challenges posed by navigation in changing environmental conditions. The following paper describes OpenSeqSLAM2.0, a fully open-source toolbox for visual place recognition under changing conditions. Beyond the benefits of open access to the source code, OpenSeqSLAM2.0 provides a number of tools to facilitate exploration of the visual place recognition problem and interactive parameter tuning. Using the new open source platform, it is shown for the first time how comprehensive parameter characterisations provide new insights into many of the system components previously presented in ad hoc ways and provide users with a guide to what system component options should be used under what circumstances and why.
Metadata only
LoST? Appearance-Invariant Place Recognition for Opposite Viewpoints using Visual Semantics
(Robotics: Science and Systems Foundation, 2018) Garg, S.; Suenderhauf, N.; Milford, M.; Robotics: Science and Systems XIV (RSS) (26 Jun 2018 - 30 Jun 2018 : Pittsburgh, PA, USA); KressGazit, H.; Srinivasa, S.; Howard, T.; Atanasov, N.
Human visual scene understanding is so remarkable that we are able to recognize a revisited place when entering it from the opposite direction it was first visited, even in the presence of extreme variations in appearance. This capability is especially apparent during driving: a human driver can recognize where they are when traveling in the reverse direction along a route for the first time, without having to turn back and look. The difficulty of this problem exceeds any addressed in past appearance- and viewpoint-invariant visual place recognition (VPR) research, in part because large parts of the scene are not commonly observable from opposite directions. Consequently, as shown in this paper, the precision-recall performance of current state-of-the-art viewpoint- and appearance-invariant VPR techniques is orders of magnitude below what would be usable in a closed-loop system. Current engineered solutions predominantly rely on panoramic camera or LIDAR sensing setups; an eminently suitable engineering solution but one that is clearly very different to how humans navigate, which also has implications for how naturally humans could interact and communicate with the navigation system. In this paper we develop a suite of novel semantic- and appearance-based techniques to enable for the first time high performance place recognition in this challenging scenario. We first propose a novel Local Semantic Tensor (LoST) descriptor of images using the convolutional feature maps from a state-of-the-art dense semantic segmentation network. Then, to verify the spatial semantic arrangement of the top matching candidates, we develop a novel approach for mining semanticallysalient keypoint correspondences. On publicly available benchmark datasets that involve both 180 degree viewpoint change and extreme appearance change, we show how meaningful recall at 100% precision can be achieved using our proposed system where existing systems often fail to ever reach 100% precision. We also present analysis delving into the performance differences between a current and the proposed system, and characterize unique properties of the opposite direction localization problem including the metric matching offset. The source code is available online at https://github.com/oravus/lostX.
Metadata only
Look no deeper: Recognizing places from opposing viewpoints under varying scene appearance using single-view depth estimation
(IEEE, 2019) Garg, S.; Babu, M.V.; Dharmasiri, T.; Hausler, S.; Suenderhauf, N.; Kumar, S.; Drummond, T.; Milford, M.; International Conference on Robotics and Automation (ICRA) (20 May 2019 - 24 May 2019 : Montreal, Canada); Howard, A.; Althoefer, K.; Arai, F.; Arrichiello, F.; Caputo, B.; Castellanos, J.; Hauser, K.; Isler, V.; Kim, J.; Liu, H.; Oh, P.; Santos, V.; Scaramuzza, D.; Ude, A.; Voyles, R.; Yamane, K.; Okamura, A.
Visual place recognition (VPR) - the act of recognizing a familiar visual place - becomes difficult when there is extreme environmental appearance change or viewpoint change. Particularly challenging is the scenario where both phenomena occur simultaneously, such as when returning for the first time along a road at night that was previously traversed during the day in the opposite direction. While such problems can be solved with panoramic sensors, humans solve this problem regularly with limited field-of-view vision and without needing to constantly turn around. In this paper, we present a new depth- and temporal-aware visual place recognition system that solves the opposing viewpoint, extreme appearancechange visual place recognition problem. Our system performs sequence-to-single frame matching by extracting depth-filtered keypoints using a state-of-the-art depth estimation pipeline, constructing a keypoint sequence over multiple frames from the reference dataset, and comparing these keypoints to the keypoints extracted from a single query image. We evaluate the system on a challenging benchmark dataset and show that it consistently outperforms state-of-the-art techniques. We also develop a range of diagnostic simulation experiments that characterize the contribution of depth-filtered keypoint sequences with respect to key domain parameters including the degree of appearance change and camera motion.
Metadata only
Improving condition- and environment-invariant place recognition with semantic place categorization
(IEEE, 2017) Garg, S.; Jacobson, A.; Kumar, S.; Milford, M.; IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (24 Sep 2017 - 28 Sep 2017 : Vancouver Convention Centre, Vancouver, British Columbia, Canada); Bicchi, A.; Okamura, A.
The place recognition problem comprises two distinct subproblems; recognizing a specific location in the world (“specific” or “ordinary” place recognition) and recognizing the type of place (place categorization). Both are important competencies for mobile robots and have each received significant attention in the robotics and computer vision community, but usually as separate areas of investigation. In this paper, we leverage the powerful complementary nature of place recognition and place categorization processes to create a new hybrid place recognition system that uses place context to inform place recognition. We show that semantic place categorization creates an informative natural segmenting of physical space that in turn enables significantly better place recognition performance in comparison to existing techniques. In particular, this new semantically-informed approach adds robustness to significant local changes within the environment, such as transitioning between indoor and outdoor environments or between dark and light rooms in a house, complementing the capabilities of current condition-invariant techniques that are robust to globally consistent change (such as day to night cycles). We perform experiments using 4 novel benchmark datasets and show that semantically-informed place recognition outperforms the previous state-of-the-art systems. Like it does for object recognition [1], we believe that semantics can play a key role in boosting conventional place recognition and navigation performance for robotic systems.

Browse

Recent Submissions