Web video event recognition by semantic analysis from ubiquitous documents

Yu, L.; Yang, Y.; Huang, Z.; Wang, P.; Song, J.; Shen, H.T.

Please use this identifier to cite or link to this item: https://hdl.handle.net/2440/109133

Scopus	Web of Science®	Altmetric
Citations
?	?

Full metadata record

DC Field	Value	Language
dc.contributor.author	Yu, L.	-
dc.contributor.author	Yang, Y.	-
dc.contributor.author	Huang, Z.	-
dc.contributor.author	Wang, P.	-
dc.contributor.author	Song, J.	-
dc.contributor.author	Shen, H.T.	-
dc.date.issued	2016	-
dc.identifier.citation	IEEE Transactions on Image Processing, 2016; 25(12):5689-5701	-
dc.identifier.issn	1057-7149	-
dc.identifier.issn	1941-0042	-
dc.identifier.uri	http://hdl.handle.net/2440/109133	-
dc.description.abstract	In recent years, the task of event recognition from videos has attracted increasing interest in multimedia area. While most of the existing research was mainly focused on exploring visual cues to handle relatively small-granular events, it is difficult to directly analyze video content without any prior knowledge. Therefore, synthesizing both the visual and semantic analysis is a natural way for video event understanding. In this paper, we study the problem of Web video event recognition, where Web videos often describe large-granular events and carry limited textual information. Key challenges include how to accurately represent event semantics from incomplete textual information and how to effectively explore the correlation between visual and textual cues for video event understanding. We propose a novel framework to perform complex event recognition from Web videos. In order to compensate the insufficient expressive power of visual cues, we construct an event knowledge base by deeply mining semantic information from ubiquitous Web documents. This event knowledge base is capable of describing each event with comprehensive semantics. By utilizing this base, the textual cues for a video can be significantly enriched. Furthermore, we introduce a two-view adaptive regression model, which explores the intrinsic correlation between the visual and textual cues of the videos to learn reliable classifiers. Extensive experiments on two real-world video data sets show the effectiveness of our proposed framework and prove that the event knowledge base indeed helps improve the performance of Web video event recognition.	-
dc.description.statementofresponsibility	Litao Yu, Yang Yang, Zi Huang	-
dc.language.iso	en	-
dc.publisher	IEEE	-
dc.rights	© 2016 IEEE	-
dc.source.uri	http://dx.doi.org/10.1109/tip.2016.2614136	-
dc.subject	Video event recognition; event knowledge base; two-view adaptive regression	-
dc.title	Web video event recognition by semantic analysis from ubiquitous documents	-
dc.type	Journal article	-
dc.identifier.doi	10.1109/TIP.2016.2614136	-
dc.relation.grant	http://purl.org/au-research/grants/arc/DP130103252	-
dc.relation.grant	http://purl.org/au-research/grants/arc/FT120100718	-
dc.relation.grant	ZYGX2014Z007	-
dc.relation.grant	ZYGX2015J055	-
pubs.publication-status	Published	-
Appears in Collections:	Aurora harvest 8 Electrical and Electronic Engineering publications

Files in This Item:

There are no files associated with this item.

Show simple item record

Adelaide Research & Scholarship