Quranic script optical text recognition using deep learning in IoT systems

Badry, M.; Hassanin, M.; Chandio, A.; Moustafa, N.

doi:10.32604/cmc.2021.015489

Quranic script optical text recognition using deep learning in IoT systems

Files

9916640273501831_12245396950001831_TSP_CMC_42154.pdf (474.45 KB)

(Published version)

Date

2021

Authors

Badry, M.

Hassanin, M.

Chandio, A.

Moustafa, N.

Type:

Journal article

Citation

Computers, Materials and Continua, 2021; 68(2):1847-1858

DOI

10.32604/cmc.2021.015489

Abstract

Since the worldwide spread of internet-connected devices and rapid advances made in Internet of Things (IoT) systems, much research has been done in using machine learning methods to recognize IoT sensors data. This is particularly the case for optical character recognition of handwritten scripts. Recognizing text in images has several useful applications, including content-based image retrieval, searching and document archiving. The Arabic language is one of the mostly used tongues in the world. However, Arabic text recognition in imagery is still very much in the nascent stage, especially handwritten text. This is mainly due to the language complexities, different writing styles, variations in the shape of characters, diacritics, and connected nature of Arabic text. In this paper, two deep learning models were proposed. The first model was based on a sequence-to-sequence recognition, while the second model was based on a fully convolution network. To measure the performance of these models, a new dataset, called QTID (Quran Text Image Dataset) was devised. This is the first Arabic dataset that includes Arabic diacritics. It consists of 309,720 different 192 × 64 annotated Arabic word images, which comprise 2,494,428 characters in total taken from the Holy Quran. The annotated images in the dataset were randomly divided into 90%, 5%, and 5% sets for training, validation, and testing purposes, respectively. Both models were set up to recognize the Arabic Othmani font in the QTID. Experimental results show that the proposed methods achieve state-of-the-art outcomes. Furthermore, the proposed models surpass expectations in terms of character recognition rate, F1-score, average precision, and recall values. They are superior to the best Arabic text recognition engines like Tesseract and ABBYY FineReader.

Rights

Copyright 2021 Tech Science Press. This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. (https://creativecommons.org/licenses/by/4.0/)

Published Version

https://doi.org/10.32604/cmc.2021.015489

Persistent link to this record

https://hdl.handle.net/11541.2/29261

Full item page

Quranic script optical text recognition using deep learning in IoT systems

Files

Date

Authors

Editors

Advisors

Journal Title

Journal ISSN

Volume Title

Type:

Citation

Statement of Responsibility

Conference Name

DOI

Abstract

School/Discipline

Dissertation Note

Provenance

Description

Access Status

Rights

License

Grant ID

Published Version

Call number

Persistent link to this record