CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection

Lin, W.D.; Deng, Y.Y.; Gao, Y.; Wang, N.; Liu, L.Q.; Zhang, L.; Wang, P.

doi:10.1007/s11390-024-1743-6

CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection

dc.contributor.author	Lin, W.D.
dc.contributor.author	Deng, Y.Y.
dc.contributor.author	Gao, Y.
dc.contributor.author	Wang, N.
dc.contributor.author	Liu, L.Q.
dc.contributor.author	Zhang, L.
dc.contributor.author	Wang, P.
dc.date.issued	2024
dc.description.abstract	Given a query patch from a novel class, one-shot object detection aims to detect all instances of this class in a target image through the semantic similarity comparison. However, due to the extremely limited guidance in the novel class as well as the unseen appearance difference between the query and target instances, it is difficult to appropriately exploit their semantic similarity and generalize well. To mitigate this problem, we present a universal Cross-Attention Transformer (CAT) module for accurate and efficient semantic similarity comparison in one-shot object detection. The proposed CAT utilizes the transformer mechanism to comprehensively capture bi-directional correspondence between any paired pixels from the query and the target image, which empowers us to sufficiently exploit their semantic characteristics for accurate similarity comparison. In addition, the proposed CAT enables feature dimensionality compression for inference speedup without performance loss. Extensive experiments on three object detection datasets MS-COCO, PASCAL VOC and FSOD under the one-shot setting demonstrate the effectiveness and efficiency of our model, e.g., it surpasses CoAE, a major baseline in this task, by 1.0% in average precision (AP) on MS-COCO and runs nearly 2.5 times faster.
dc.description.statementofresponsibility	Wei-Dong Lin (林蔚东), Yu-Yan Deng (邓玉岩), Yang Gao (高　扬), Ning Wang (王　宁), Ling-Qiao Liu (刘凌峤), Lei Zhang (张　磊), and Peng Wang (王　鹏)
dc.identifier.citation	Journal of Computer Science and Technology, 2024; 39(2):460-471
dc.identifier.doi	10.1007/s11390-024-1743-6
dc.identifier.issn	1000-9000
dc.identifier.issn	1860-4749
dc.identifier.uri	https://hdl.handle.net/2440/147946
dc.language.iso	en
dc.publisher	Springer
dc.relation.grant	http://purl.org/au-research/grants/arc/2021JCW-03
dc.rights	© Institute of Computing Technology, Chinese Academy of Sciences 2024
dc.source.uri	http://dx.doi.org/10.1007/s11390-024-1743-6
dc.subject	one-shot object detection; Transformer; attention mechanism
dc.title	CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection
dc.type	Journal article
pubs.publication-status	Published

Collections

Research Outputs

CAT: A Simple yet Effective Cross-Attention Transformer for One-Shot Object Detection

Files

Collections