Parallel attention: a unified framework for visual object discovery through dialogs and queries

Zhuang, B.; Wu, Q.; Shen, C.; Reid, I.; van den Hengel, A.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/120064

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Zhuang, B.	-
dc.contributor.author	Wu, Q.	-
dc.contributor.author	Shen, C.	-
dc.contributor.author	Reid, I.	-
dc.contributor.author	van den Hengel, A.	-
dc.date.issued	2018	-
dc.identifier.citation	Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp.4252-4261	-
dc.identifier.isbn	9781538664209	-
dc.identifier.issn	2575-7075	-
dc.identifier.uri	http://hdl.handle.net/2440/120064	-
dc.description.abstract	Recognising objects according to a pre-defined fixed set of class labels has been well studied in the Computer Vision. There are a great many practical applications where the subjects that may be of interest are not known beforehand, or so easily delineated, however. In many of these cases natural language dialog is a natural way to specify the subject of interest, and the task achieving this capability (a.k.a, Referring Expression Comprehension) has recently attracted attention. To this end we propose a unified framework, the ParalleL AttentioN (PLAN) network, to discover the object in an image that is being referred to in variable length natural expression descriptions, from short phrases query to long multi-round dialogs. The PLAN network has two attention mechanisms that relate parts of the expressions to both the global visual content and also directly to object candidates. Furthermore, the attention mechanisms are recurrent, making the referring process visualizable and explainable. The attended information from these dual sources are combined to reason about the referred object. These two attention mechanisms can be trained in parallel and we find the combined system outperforms the state-of-art on several benchmarked datasets with different length language input, such as RefCOCO, RefCOCO+ and GuessWhat?!.	-
dc.description.statementofresponsibility	Bohan Zhuang, Qi Wu, Chunhua Shen, Ian Reid, Anton van den Hengel	-
dc.language.iso	en	-
dc.publisher	IEEE	-
dc.relation.ispartofseries	IEEE Conference on Computer Vision and Pattern Recognition	-
dc.rights	© 2018 IEEE	-
dc.source.uri	http://dx.doi.org/10.1109/cvpr.2018.00447	-
dc.title	Parallel attention: a unified framework for visual object discovery through dialogs and queries	-
dc.type	Conference paper	-
dc.contributor.conference	IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (18 Jun 2018 - 23 Jun 2018 : Salt Lake City, UT)	-
dc.identifier.doi	10.1109/CVPR.2018.00447	-
dc.relation.grant	http://purl.org/au-research/grants/arc/FL130100102	-
pubs.publication-status	Published	-
dc.identifier.orcid	Wu, Q. [0000-0003-3631-256X]	-
dc.identifier.orcid	Reid, I. [0000-0001-7790-6423]	-
dc.identifier.orcid	van den Hengel, A. [0000-0003-3027-8364]	-
Appears in Collections:	Aurora harvest 4 Australian Institute for Machine Learning publications Computer Science publications

Files in This Item:

File	Description	Size	Format
hdl_120064.pdf	Submitted version	2.11 MB	Adobe PDF	View/Open

Show simple item record

Adelaide Research & Scholarship