Learning representative deep features for image set analysis

dc.contributor.authorWu, Z.
dc.contributor.authorHuang, Y.
dc.contributor.authorWang, L.
dc.date.issued2015
dc.description.abstractThis paper proposes to learn features from sets of labeled raw images. With this method, the problem of over-fitting can be effectively suppressed, so that deep CNNs can be trained from scratch with a small number of training data, i.e., 420 labeled albums with about 30 000 photos. This method can effectively deal with sets of images, no matter if the sets bear temporal structures. A typical approach to sequential image analysis usually leverages motions between adjacent frames, while the proposed method focuses on capturing the co-occurrences and frequencies of features. Nevertheless, our method outperforms previous best performers in terms of album classification, and achieves comparable or even better performances in terms of gait based human identification. These results demonstrate its effectiveness and good adaptivity to different kinds of set data.
dc.description.statementofresponsibilityZifeng Wu, Yongzhen Huang, Liang Wang
dc.identifier.citationIEEE Transactions on Multimedia, 2015; 17(11):1960-1968
dc.identifier.doi10.1109/TMM.2015.2477681
dc.identifier.issn1520-9210
dc.identifier.issn1941-0077
dc.identifier.urihttp://hdl.handle.net/2440/109476
dc.language.isoen
dc.publisherIEEE
dc.rights© 2015 IEEE.
dc.source.urihttps://doi.org/10.1109/tmm.2015.2477681
dc.subjectFeature extraction; hidden Markov models; convolution; training data; videos; training, data models
dc.titleLearning representative deep features for image set analysis
dc.typeJournal article
pubs.publication-statusPublished

Files