Australian Institute for Machine Learning publications
Permanent URI for this collection
Browse
Browsing Australian Institute for Machine Learning publications by Issue Date
Now showing 1 - 20 of 123
Results Per Page
Sort Options
Item Metadata only Solving the shape-from-shading problem on the CM-5(IEEE, 1995) Brooks, M.J.; Chojnacki, W.; van den Hengel, A.; Computer Architectures for Machine Perception (CAMP '95) (18 Sep 1995 - 20 Sep 1995 : Como, Italy)We consider the problem of recovering surface shape from image shading for the situation in which a distant overhead "sun" illuminates a Lambertian surface. An iterative scheme is presented which requires no prerequisite shape information. This scheme forms the basis for a parallel algorithm implemented on a CM-5. Performance of the CM-5 implementation is compared with that of a sequential implementation running on a Sun Sparc 2. Also considered are the complexity and scalability of the parallel algorithm as a function of image size and number of processors, respectively.Item Metadata only Robust techniques for the estimation of structure from motion in the uncalibrated case(Springer, 1998) Brooks, M.J.; Chojnacki, W.; van den Hengel, A.; Baumela, L.; 5th European Conference on Computer Vision (ECCV'98) (2 Jun 1998 - 6 Jun 1998 : Freiburg, Germany); Burkhardt, H.; Neumann, B.Robust techniques are developed for determining structure from motion in the uncalibrated case. The structure recovery is based on previous work [7] in which it was shown that a camera undergoing unknown motion and having an unknown, and possibly varying, focal length can be self-calibrated via closed-form expressions in the entries of two matrices derivable from an instantaneous optical flow field. Critical to the recovery process is the obtaining of accurate numerical estimates, up to a scalar factor, of these matrices in the presence of noisy optical flow data. We present techniques for the determination of these matrices via least-squares methods, and also a way of enforcing a dependency constraint that is imposed on these matrices. A method for eliminating outlying flow vectors is also given. Results of experiments with real-image sequences are presented that suggest that the approach holds promise.Item Metadata only Incorporating optical flow uncertainty information into a self-calibration procedure for a moving camera(SPIE, 1998) Brooks, M.J.; Chojnacki, W.; Dick, A.; van den Hengel, A.; Kanatani, K.; Ohta, N.; Electronic Imaging '99 (28 Jan 1999 - 29 Jan 1999 : San Jose, CA); ElHakim, S.F.; Gruen, A.In this paper we consider robust techniques for estimating structure from motion in the uncalibrated case. We show how information describing the uncertainty of the data may be incorporated into the formulation of the problem, and we explore the situations in which this appears to be advantageous. The structure recovery technique is based on a method for self-calibrating a single moving camera from instantaneous optical flow developed in previous work of some of the authors. The method of self-calibration rests upon an equation that we term the differential epipolar equation for uncalibrated optical flow. This equation incorporates two matrices (analogous to the fundamental matrix in stereo vision) which encode information about the ego-motion and internal geometry of the camera. Any sufficiently large, non- degenerate optical flow field enables the ratio of the entries of the two matrices to be estimated. Under certain assumptions, the moving camera can be self-calibrated by means of closed-form expressions in the entries of these matrices. Reconstruction of the scene, up to a scalar factor, may then proceed using a straightforward method. The critical step in this whole approach is therefore the accurate estimation of the aforementioned ratio. To this end, the problem is couched in a least-squares minimization framework whereby candidate cost functions are derived via ordinary least squares, total least squares, and weighted least squares techniques. Various computational schemes are adopted for minimizing the cost functions. Carefully devised synthetic experiments reveal that when the optical flow field is contaminated with inhomogeneous and anisotropic Gaussian noise, the best performer is the weighted least squares approach with renormalization.Item Metadata only Estimating vision parameters given data with covariances(ILES Central Press, 2000) Chojnacki, W.; Brooks, M.; Van Den Hengel, A.; Gawley, D.; British Machine Vision Conference (11th : 2000 : Bristol, UK); Mirmehdi, M.; Thomas, B.A new parameter estimation method is presented, applicable to many computer vision problems. It operates under the assumption that the data (typically image point locations) are accompanied by covariance matrices characterising data uncertainty. An MLE-based cost function is first formulated and a new minimisation scheme is then developed. Unlike Sampson’s method or the renormalisation technique of Kanatani, the new scheme has as its theoretical limit the true minimum of the cost function. It also has the advantages of being simply expressed, efficient, and unsurpassed in our comparative testing.Item Metadata only Layer extraction with a bayesian model of shapes(Springer, 2000) Torr, P.; Dick, A.R.; Cipolla, R.; European Conference on Computer Vision (ECCV) (26 Jun 2000 - 1 Jul 2000 : Dublin)This paper describes an automatic 3D surface modelling system that extracts dense 3D surfaces from uncalibrated video sequences. In order to extract this 3D model the scene is represented as a collection of layers and a new method for layer extraction is described. The new segmentation method differs from previous methods in that it uses a specific prior model for layer shape. A probabilistic hierarchical model of layer shape is constructed, which assigns a density function to the shape and spatial relationships between layers. This allows accurate and efficient algorithms to be used when finding the best segmentation. Here this framework is applied to architectural scenes, in which layers commonly correspond to windows or doors and hence belong to a tightly constrained family of shapes.Item Metadata only Incorporating constraints into the design of locally identifiable calibration patterns(IEEE, 2003) Van Den Hengel, A.; Hill, R.; Brooks, M.; IEEE International Conference on Image Processing (2003 : Barcelona, Spain); Torres, L.Camera calibration requires the identification of points in an image that correspond to known locations in the scene. These are typically determined through the use of a calibration pattern designed to facilitate feature localisation. We present in this paper a novel method of generating patterns such that each subregion is individually identifiable by its cross ratio. The method aims to minimise the probability of misidentifying a subregion. A key advantage of the method is the ability to place constraints on the size of the elements constituting the pattern. This allows a calibration object to be used in a wider variety of viewing conditions, increasing the flexibility of the calibration process.Item Metadata only Video Trace: rapid interactive scene modelling from video(Assoc Computing Machinery, 2007) Van Den Hengel, A.; Dick, A.; Thormaehlen, T.; Ward, B.; Torr, P.VideoTrace is a system for interactively generating realistic 3D models of objects from video---models that might be inserted into a video game, a simulation environment, or another video sequence. The user interacts with VideoTrace by tracing the shape of the object to be modelled over one or more frames of the video. By interpreting the sketch drawn by the user in light of 3D information obtained from computer vision techniques, a small number of simple 2D interactions can be used to generate a realistic 3D model. Each of the sketching operations in VideoTrace provides an intuitive and powerful means of modelling shape from video, and executes quickly enough to be used interactively. Immediate feedback allows the user to model rapidly those parts of the scene which are of interest and to the level of detail required. The combination of automated and manual reconstruction allows VideoTrace to model parts of the scene not visible, and to succeed in cases where purely automated approaches would fail.Item Metadata only Generalised Principal Component Analysis: exploiting inherent parameter constraints(Springer, 2007) Chojnacki, W.; van den Hengel, A.; Brooks, M.J.; 1st International Conferences on Computer Vision Theory and Applications (VISAPP 2006) and Computer Graphics Theory and Applications (GRAPP 2006) (25 Feb 2006 - 28 Feb 2006 : Setubal, Portugal); Braz, J.; Ranchordas, A.; Araujo, H.; Jorge, J.Generalised Principal Component Analysis (GPCA) is a recently devised technique for fitting a multi-component, piecewise-linear structure to data that has found strong utility in computer vision. Unlike other methods which intertwine the processes of estimating structure components and segmenting data points into clusters associated with putative components, GPCA estimates a multi-component structure with no recourse to data clustering. The standard GPCA algorithm searches for an estimate by minimising a simple algebraic misfit function. The underlying constraints on the model parameters are ignored. Here we promote a variant of GPCA that incorporates the parameter constraints and exploits constrained rather than unconstrained minimisation of a statistically motivated error function. The output of any GPCA algorithm hardly ever perfectly satisfies the parameter constraints. Our new version of GPCA greatly facilitates the final correction of the algorithm output to satisfy perfectly the constraints, making this step less prone to error in the presence of noise. The method is applied to the example problem of fitting a pair of lines to noisy image points, but has potential for use in more general multi-component structure fitting in computer vision.Item Metadata only Image based modelling with VideoTrace(Association for Computing Machinery, 2008) Van Den Hengel, A.; Dick, A.Image based modeling (IBM) combines aspects of computer vision, graphics and interface design. Practitioners in each field have approached IBM within the context of their own discipline, but recently systems have emerged that harness the strengths of each. In this article we discuss approaches to IBM, the development of our own system VideoTrace, and applications now and in the future.Item Metadata only Searching in space and time: a system for forensic analysis of large video repositories(IEEE, 2008) Van Den Hengel, A.; Hill, R.; Detmold, H.; Dick, A.; e-Forensics 2008 (1st : 2008 : Adelaide, Australia); Sorell, M.; White, L.The use of surveillance cameras to monitor public buildings and urban areas is becoming increasingly widespread. Each camera delivers a continuous stream of video data, which, once archived, is a valuable source of information for forensic analysis. However, current video analysis tools are primar- ily based on searching backwards and forwards in time at a single location (i.e. camera), which does not account for events or people of interest that change location over time. In this paper we describe a practical system for tracking a target backwards and forwards in both space and time, ef- fectively following a feature of interest as it moves within and between cameras in a surveillance network. This pro- vides a video analysis tool that is target-centred rather than camera-centred, and thus allows rapid access to the footage that matters for forensic analysis.Item Metadata only A relaxation method to articulated trajectory reconstruction from monocular image sequence(IEEE, 2014) Li, B.; Dai, Y.; He, M.; Van Den Hengel, A.; 2nd IEEE China Summit & International Conference on Signal and Information Processing (ChinaSIP 2014) (9 Jul 2014 - 13 Jul 2014 : Xi'an, China)In this paper, we present a novel method for articulated trajectory reconstruction from a monocular image sequence. We propose a relaxation-based objective function, which utilises both smoothness and geometric constraints, posing articulated trajectory reconstruction as a non-linear optimization problem. The main advantage of this approach is that it remains the re-constructive power of the original algorithm, while improving its robustness to the inevitable noise in the data. Furthermore, we present an effective approach to estimating the parameters of our objective function. Experimental results on the CMU motion capture dataset show that our proposed algorithm is effective.Item Metadata only Deeply learning the messages in message passing inference(Neural Information Processing Systems, 2015) Lin, G.; Shen, C.; Reid, I.; Van Den Hengel, A.; 29th Annual Conference on Neural Information Processing Systems 2015 (NIPS 2015) (7 Dec 2015 - 12 Dec 2015 : Montreal); Cortes, C.; Lawrence, N.; Lee, D.; Sugiyama, M.; Garnett, R.Deep structured output learning shows great promise in tasks like semantic image segmentation. We proffer a new, efficient deep structured model learning scheme, in which we show how deep Convolutional Neural Networks (CNNs) can be used to directly estimate the messages in message passing inference for structured prediction with Conditional Random Fields CRFs). With such CNN message estimators, we obviate the need to learn or evaluate potential functions for message calculation. This confers significant efficiency for learning, since otherwise when performing structured learning for a CRF with CNN potentials it is necessary to undertake expensive inference for every stochastic gradient iteration. The network output dimension of message estimators is the same as the number of classes, rather than exponentially growing in the order of the potentials. Hence it is more scalable for cases that a large number of classes are involved. We apply our method to semantic image segmentation and achieve impressive performance, which demonstrates the effectiveness and usefulness of our CNN message learning method.Item Metadata only MPGL: An efficient matching pursuit method for generalized LASSO(AAAI, 2017) Gong, D.; Tan, M.; Zhang, Y.; Van Den Hengel, A.; Shi, Q.; 31st AAAI Conference on Artificial Intelligence (AAAI-17) (4 Feb 2017 - 9 Feb 2017 : San Francisco)Unlike traditional LASSO enforcing sparsity on the variables, Generalized LASSO (GL) enforces sparsity on a linear transformation of the variables, gaining flexibility and success in many applications. However, many existing GL algorithms do not scale up to high-dimensional problems, and/or only work well for a specific choice of the transformation. We propose an efficient Matching Pursuit Generalized LASSO (MPGL) method, which overcomes these issues, and is guaranteed to converge to a global optimum. We formulate the GL problem as a convex quadratic constrained linear programming (QCLP) problem and tailor-make a cutting plane method. More specifically, our MPGL iteratively activates a subset of nonzero elements of the transformed variables, and solves a subproblem involving only the activated elements thus gaining significant speed-up. Moreover, MPGL is less sensitive to the choice of the trade-off hyper-parameter between data fitting and regularization, and mitigates the long-standing hyper-parameter tuning issue in many existing methods. Experiments demonstrate the superior efficiency and accuracy of the proposed method over the state-of-the-arts in both classification and image processing tasks.Item Metadata only Solving constrained combinatorial optimization problems via MAP inference without high-order penalties(AAAI, 2017) Zhang, Z.; Shi, Q.; McAuley, J.; Wei, W.; Zhang, Y.; Yao, R.; Van Den Hengel, A.; Thirty-first AAAI Conference on Artificial Intelligence (AAAI-17) (4 Feb 2017 - 9 Feb 2017 : San Francisco)Solving constrained combinatorial optimization problems via MAP inference is often achieved by introducing extra potential functions for each constraint. This can result in very high order potentials, e.g. a 2nd-order objective with pairwise potentials and a quadratic constraint over all N variables would correspond to an unconstrained objective with an order-N potential. This limits the practicality of such an approach, since inference with high order potentials is tractable only for a few special classes of functions. We propose an approach which is able to solve constrained combinatorial problems using belief propagation without increasing the order. For example, in our scheme the 2nd-order problem above remains order 2 instead of order N. Experiments on applications ranging from foreground detection, image reconstruction, quadratic knapsack, and the M-best solutions problem demonstrate the effectiveness and efficiency of our method. Moreover, we show several situations in which our approach outperforms commercial solvers like CPLEX and others designed for specific constrained MAP inference problems.Item Metadata only Improving condition- and environment-invariant place recognition with semantic place categorization(IEEE, 2017) Garg, S.; Jacobson, A.; Kumar, S.; Milford, M.; IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (24 Sep 2017 - 28 Sep 2017 : Vancouver Convention Centre, Vancouver, British Columbia, Canada); Bicchi, A.; Okamura, A.The place recognition problem comprises two distinct subproblems; recognizing a specific location in the world (“specific” or “ordinary” place recognition) and recognizing the type of place (place categorization). Both are important competencies for mobile robots and have each received significant attention in the robotics and computer vision community, but usually as separate areas of investigation. In this paper, we leverage the powerful complementary nature of place recognition and place categorization processes to create a new hybrid place recognition system that uses place context to inform place recognition. We show that semantic place categorization creates an informative natural segmenting of physical space that in turn enables significantly better place recognition performance in comparison to existing techniques. In particular, this new semantically-informed approach adds robustness to significant local changes within the environment, such as transitioning between indoor and outdoor environments or between dark and light rooms in a house, complementing the capabilities of current condition-invariant techniques that are robust to globally consistent change (such as day to night cycles). We perform experiments using 4 novel benchmark datasets and show that semantically-informed place recognition outperforms the previous state-of-the-art systems. Like it does for object recognition [1], we believe that semantics can play a key role in boosting conventional place recognition and navigation performance for robotic systems.Item Metadata only Data-driven approximations to NP-hard problems(AAAI, 2017) Milan, A.; Rezatofighi, S.; Garg, R.; Dick, A.; Reid, I.; Thirty-first AAAI Conference on Artificial Intelligence (AAAI-17) (4 Feb 2017 - 9 Feb 2017 : San Francisco)There exist a number of problem classes for which obtaining the exact solution becomes exponentially expensive with increasing problem size. The quadratic assignment problem (QAP) or the travelling salesman problem (TSP) are just two examples of such NP-hard problems. In practice, approximate algorithms are employed to obtain a suboptimal solution, where one must face a trade-off between computational complexity and solution quality. In this paper, we propose to learn to solve these problem from approximate examples, using recurrent neural networks (RNNs). Surprisingly, such architectures are capable of producing highly accurate solutions at minimal computational cost. Moreover, we introduce a simple, yet effective technique for improving the initial (weak) training set by incorporating the objective cost into the training procedure. We demonstrate the functionality of our approach on three exemplar applications: marginal distributions of a joint matching space, feature point matching and the travelling salesman problem. We show encouraging results on synthetic and real data in all three cases.Item Metadata only Explicit knowledge-based reasoning for visual question answering(IJCAI, 2017) Wang, P.; Wu, Q.; Shen, C.; Dick, A.; Van Den Hengel, A.; 26th International Joint Conference on Artificial Intelligence (IJCAI-17) (19 Aug 2017 - 26 Aug 2017 : Melbourne); Sierra, C.We describe a method for visual question answering which is capable of reasoning about an image on the basis of information extracted from a largescale knowledge base. The method not only answers natural language questions using concepts not contained in the image, but can explain the reasoning by which it developed its answer. It is capable of answering far more complex questions than the predominant long short-term memory-based approach, and outperforms it significantly in testing. We also provide a dataset and a protocol by which to evaluate general visual question answering methods.Item Metadata only Online multi-target tracking using recurrent neural networks(ASSOC ADVANCEMENT ARTIFICIAL INTELLIGENCE, 2017) Milan, A.; Rezatofighi, H.; Dick, A.; Reid, I.; Schindler, K.; 31st AAAI Conference on Artificial Intelligence (AAAI 2017) (4 Feb 2017 - 9 Feb 2017 : San Francisco)We present a novel approach to online multi-target tracking based on recurrent neural networks (RNNs). Tracking multiple objects in real-world scenes involves many challenges, including a) an a-priori unknown and time-varying number of targets, b) a continuous state estimation of all present targets, and c) a discrete combinatorial problem of data association. Most previous methods involve complex models that require tedious tuning of parameters. Here, we propose for the first time, an end-to-end learning approach for online multi-target tracking. Existing deep learning methods are not designed for the above challenges and cannot be trivially applied to the task. Our solution addresses all of the above points in a principled way. Experiments on both synthetic and real data show promising results obtained at ~300 Hz on a standard CPU, and pave the way towards future research in this direction.Item Open Access Visual Question Answering: a tutorial(IEEE, 2017) Teney, D.; Wu, Q.; Van Den Hengel, A.The task of visual question answering (VQA) is receiving increasing interest from researchers in both the computer vision and natural language processing fields. Tremendous advances have been seen in the field of computer vision due to the success of deep learning, in particular on low- and midlevel tasks, such as image segmentation or object recognition. These advances have fueled researchers' confidence for tackling more complex tasks that combine vision with language and high-level reasoning. VQA is a prime example of this trend. This article presents the ongoing work in the field and the current approaches to VQA based on deep learning. VQA constitutes a test for deep visual understanding and a benchmark for general artificial intelligence (AI). While the field of VQA has seen recent successes, it remains a largely unsolved task.Item Open Access RRD-SLAM: Radial-distorted rolling-shutter direct SLAM(IEEE, 2017) Kim, J.; Latif, Y.; Reid, I.; IEEE International Conference on Robotics and Automation (ICRA) (29 May 2017 - 3 Jun 2017 : Singapore)In this paper, we present a monocular direct semi-dense SLAM (Simultaneous Localization And Mapping) method that can handle both radial distortion and rolling-shutter distortion. Such distortions are common in, but not restricted to, situations when an inexpensive wide-angle lens and a CMOS sensor are used, and leads to significant inaccuracy in the map and trajectory estimates if not modeled correctly. The apparent naive solution of simply undistorting the images using pre-calibrated parameters does not apply to this case since rows in the undistorted image are no longer captured at the same time. To address this we develop an algorithm that incorporates radial distortion into an existing state-of-the-art direct semi-dense SLAM system that takes rolling-shutters into account. We propose a method for finding the generalized epipolar curve for each rolling-shutter radially distorted image. Our experiments demonstrate the efficacy of our approach and compare it favorably with the state-of-the-art in direct semi-dense rolling-shutter SLAM.