Synthetized Multilanguage OCR Using CRNN and SVTR Models for Realtime Collaborative Tools

Biró, Attila; Cuesta-Vargas, Antonio; Martín-Martín, Jaime; Szilágyi, László; Miklós Szilágyi, Sándor

doi:10.3390/app13074419

Synthetized Multilanguage OCR Using CRNN and SVTR Models for Realtime Collaborative Tools

Files

applsci-13-04419.pdf (2.18 MB)

Identifiers

URI: https://hdl.handle.net/10630/26967

DOI: 10.3390/app13074419

Publication date

2023-03-30

Authors

Biró, Attila

Cuesta-Vargas, Antonio

Martín-Martín, Jaime

Szilágyi, László

Miklós Szilágyi, Sándor

Publisher

MDPI

Metrics

Share

Export

Center

Facultad de Ciencias de la Salud

Department/Institute

Fisioterapia

Keywords

Diagnosticos
Dispositivos ópticos de reconocimiento de caracteres

Abstract

Background: Remote diagnosis using collaborative tools have led to multilingual joint working sessions in various domains, including comprehensive health care, and resulting in more inclusive health care services. One of the main challenges is providing a real-time solution for shared documents and presentations on display to improve the efficacy of noninvasive, safe, and far-reaching collaborative models. Classic optical character recognition (OCR) solutions fail when there is a mixture of languages or dialects or in case of the participation of different technical levels and skills. Due to the risk of misunderstandings caused by mistranslations or lack of domain knowledge of the interpreters involved, the technological pipeline also needs artificial intelligence (AI)-supported improvements on the OCR side. This study examines the feasibility of machine learning-supported OCR in a multilingual environment. The novelty of our method is that it provides a solution not only for different speaking languages but also for a mixture of technological languages, using artificially created vocabulary and a custom training data generation approach. Methods: A novel hybrid language vocabulary creation method is utilized in the OCR training process in combination with convolutional recurrent neural networks (CRNNs) and a single visual model for scene text recognition within the patch-wise image tokenization framework (SVTR). Data: In the research, we used a dedicated Python-based data generator built on dedicated collaborative tool-based templates to cover and simulated the real-life variances of remote diagnosis and co-working collaborative sessions with high accuracy. The generated training datasets ranged from 66 k to 8.5 M in size. Twenty-one research results were analyzed. Instruments: Training was conducted by using tuned PaddleOCR with CRNN and SVTR modeling and a domain-specific, customized vocabulary. [...]

Bibliographic citation

Biró A, Cuesta-Vargas AI, Martín-Martín J, Szilágyi L, Szilágyi SM. Synthetized Multilanguage OCR Using CRNN and SVTR Models for Realtime Collaborative Tools. Applied Sciences. 2023; 13(7):4419. https://doi.org/10.3390/app13074419

Collections

Artículos

Creative Commons license

Except where otherwised noted, this item's license is described as Atribución 4.0 Internacional

Full item page

Synthetized Multilanguage OCR Using CRNN and SVTR Models for Realtime Collaborative Tools

Files

Identifiers

Publication date

Reading date

Authors

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Share

Export

Research Projects

Organizational Units

Journal Issue

Center

Department/Institute

Keywords

Abstract

Description

Bibliographic citation

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license