Structured learning of human interactions in TV shows

Patron-Perez, A.; Marszalek, M.; Reid, I.; Zisserman, A.

doi:10.1109/TPAMI.2012.24

Structured learning of human interactions in TV shows

dc.contributor.author	Patron-Perez, A.
dc.contributor.author	Marszalek, M.
dc.contributor.author	Reid, I.
dc.contributor.author	Zisserman, A.
dc.date.issued	2012
dc.description.abstract	The objective of this work is recognition and spatiotemporal localization of two-person interactions in video. Our approach is person-centric. As a first stage we track all upper bodies and heads in a video using a tracking-by-detection approach that combines detections with KLT tracking and clique partitioning, together with occlusion detection, to yield robust person tracks. We develop local descriptors of activity based on the head orientation (estimated using a set of pose-specific classifiers) and the local spatiotemporal region around them, together with global descriptors that encode the relative positions of people as a function of interaction type. Learning and inference on the model uses a structured output SVM which combines the local and global descriptors in a principled manner. Inference using the model yields information about which pairs of people are interacting, their interaction class, and their head orientation (which is also treated as a variable, enabling mistakes in the classifier to be corrected using global context). We show that inference can be carried out with polynomial complexity in the number of people, and describe an efficient algorithm for this. The method is evaluated on a new dataset comprising 300 video clips acquired from 23 different TV shows and on the benchmark UT--Interaction dataset.
dc.description.statementofresponsibility	Alonso Patron-Perez, Marcin Marszalek, Ian Reid, and Andrew Zisserman
dc.identifier.citation	IEEE Transactions on Pattern Analysis and Machine Intelligence, 2012; 34(12):2441-2453
dc.identifier.doi	10.1109/TPAMI.2012.24
dc.identifier.issn	0162-8828
dc.identifier.issn	1939-3539
dc.identifier.orcid	Reid, I. [0000-0001-7790-6423]
dc.identifier.uri	http://hdl.handle.net/2440/87259
dc.language.iso	en
dc.publisher	IEEE
dc.rights	© 2012 IEEE
dc.source.uri	https://doi.org/10.1109/tpami.2012.24
dc.subject	Human interaction recognition; video retrieval; structured SVM
dc.title	Structured learning of human interactions in TV shows
dc.type	Journal article
pubs.publication-status	Published

Files

Original bundle

Now showing 1 - 1 of 1

Name:: RA_hdl_87259.pdf
Size:: 1.4 MB
Format:: Adobe Portable Document Format
Description:: Restricted Access

Download

Collections

Computer Science publications