Proposal-free temporal moment localization of a natural-language query in video using guided attention

Rodriguez Opazo, C.; Marrese-Taylor, E.; Saleh, F.S.; Li, H.; Gould, S.

doi:10.1109/WACV45572.2020.9093328

Proposal-free temporal moment localization of a natural-language query in video using guided attention

dc.contributor.author	Rodriguez Opazo, C.
dc.contributor.author	Marrese-Taylor, E.
dc.contributor.author	Saleh, F.S.
dc.contributor.author	Li, H.
dc.contributor.author	Gould, S.
dc.contributor.conference	IEEE Winter Conference on Applications of Computer Vision (WACV) (1 Mar 2020 - 5 Mar 2020 : Snowmass, CO, USA)
dc.date.issued	2020
dc.description.abstract	This paper studies the problem of temporal moment localization in a long untrimmed video using natural language as the query. Given an untrimmed video and a query sentence, the goal is to determine the start and end of the relevant visual moment in the video that corresponds to the query sentence. While most previous works have tackled this by a propose-and-rank approach, we introduce a more efficient, end-to-end trainable, and proposal-free approach that is built upon three key components: a dynamic filter which adaptively transfers language information to visual domain attention map, a new loss function to guide the model to attend the most relevant part of the video, and soft labels to cope with annotation uncertainties. Our method is evaluated on three standard benchmark datasets, Charades-STA, TACoS and ActivityNet-Captions. Experimental results show our method outperforms state-of-theart methods on these datasets, confirming the effectiveness of the method. We believe the proposed dynamic filter-based guided attention mechanism will prove valuable for other vision and language tasks as well.
dc.description.statementofresponsibility	Cristian Rodriguez-Opazo, Edison Marrese-Taylor, Fatemeh Sadat Saleh, Hongdong Li, Stephen Gould
dc.identifier.citation	Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV 2020), 2020, pp.2453-2462
dc.identifier.doi	10.1109/WACV45572.2020.9093328
dc.identifier.isbn	9781728165530
dc.identifier.issn	2642-9381
dc.identifier.issn	2472-6737
dc.identifier.orcid	Rodriguez Opazo, C. [0000-0002-2108-3904]
dc.identifier.uri	https://hdl.handle.net/2440/138860
dc.language.iso	en
dc.publisher	IEEE
dc.relation.grant	http://purl.org/au-research/grants/arc/CE140100016
dc.relation.ispartofseries	IEEE Winter Conference on Applications of Computer Vision
dc.rights	©2020 IEEE
dc.source.uri	https://ieeexplore.ieee.org/xpl/conhome/9087828/proceeding
dc.title	Proposal-free temporal moment localization of a natural-language query in video using guided attention
dc.type	Conference paper
pubs.publication-status	Published

Collections

Australian Institute for Machine Learning publications

Proposal-free temporal moment localization of a natural-language query in video using guided attention

Files

Collections