Please use this identifier to cite or link to this item:
Scopus Web of Science® Altmetric
Type: Conference paper
Title: An end-to-end textspotter with explicit alignment and attention
Author: He, T.
Tian, Z.
Huang, W.
Shen, C.
Qiao, Y.
Sun, C.
Citation: Proceedings / CVPR, IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2018, pp.5020-5029
Publisher: IEEE
Issue Date: 2018
Series/Report no.: IEEE Conference on Computer Vision and Pattern Recognition
ISBN: 9781538664209
ISSN: 2575-7075
Conference Name: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (18 Jun 2018 - 23 Jun 2018 : Salt Lake City, UT)
Statement of
Tong He, Zhi Tian, Weilin Huang, Chunhua Shen, Yu Qiao, Changming Sun
Abstract: Text detection and recognition in natural images have long been considered as two separate tasks that are processed sequentially. Jointly training two tasks is non-trivial due to significant differences in learning difficulties and convergence rates. In this work, we present a conceptually simple yet efficient framework that simultaneously processes the two tasks in a united framework. Our main contributions are three-fold: (1) we propose a novel text-alignment layer that allows it to precisely compute convolutional features of a text instance in arbitrary orientation, which is the key to boost the performance; (2) a character attention mechanism is introduced by using character spatial information as explicit supervision, leading to large improvements in recognition; (3) two technologies, together with a new RNN branch for word recognition, are integrated seamlessly into a single model which is end-to-end trainable. This allows the two tasks to work collaboratively by sharing convolutional features, which is critical to identify challenging text instances. Our model obtains impressive results in end-to-end recognition on the ICDAR 2015 [19], significantly advancing the most recent results [2], with improvements of F-measure from (0.54, 0.51, 0.47) to (0.82, 0.77, 0.63), by using a strong, weak and generic lexicon respectively. Thanks to joint training, our method can also serve as a good detector by achieving a new state-of-the-art detection performance on related benchmarks. Code is available at
Rights: © 2018 IEEE
DOI: 10.1109/CVPR.2018.00527
Appears in Collections:Aurora harvest 8
Computer Science publications

Files in This Item:
There are no files associated with this item.

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.