Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

Biró, Attila; Tünde Janosi-Rancz, Katalin; Szilágyi, László; Cuesta-Vargas, Antonio; Martín-Martín, Jaime; Miklós Szilágyi, Sándor

doi:10.3390/app12125977

Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

Files

applsci-12-05977-v2-3.pdf (3.54 MB)

Description: Artículo principal

Identifiers

URI: https://hdl.handle.net/10630/32668

DOI: 10.3390/app12125977

Publication date

2022-06-12

Authors

Biró, Attila

Tünde Janosi-Rancz, Katalin

Szilágyi, László

Cuesta-Vargas, Antonio

Martín-Martín, Jaime

Miklós Szilágyi, Sándor

Publisher

MDPI

Metrics

Share

Export

Department/Institute

Salud Pública y Psiquiatría

Keywords

Diagnóstico por imagen

Abstract

This text discusses the need for real-time multilingual sentence detection during online video presentations, particularly in the healthcare sector for remote diagnosis. The use of visual (textual) object detection and preprocessing is essential for subsequent analysis. The researchers propose using the DEtection TRansformer (DETR) model to achieve accurate and real-time detection of textual objects. The development of real-time videoconference translation supported by artificial intelligence has become especially important during the COVID-19 pandemic. The challenge lies in the variety of languages spoken by specialists, which requires human translators or AI-based technological channels. The accuracy of visual localization of textual elements depends on the complexity, quality, and variety of the training datasets. The researchers compare the performance of the DETR model with other real-time object detectors like YOLO4 and Detectron2, and introduce AI-based innovations through collaborative solutions combined with OCR. The researchers conducted evaluations using training datasets and achieved higher-than-expected accuracy in terms of visual text detection range, with an average accuracy of 0.4 to 0.65.

Bibliographic citation

Biró, A.; Jánosi-Rancz, K.T.; Szilágyi, L.; Cuesta-Vargas, A.I.; Martín-Martín, J.; Szilágyi, S.M. Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools. Appl. Sci. 2022, 12, 5977. https://doi.org/10.3390/app12125977

Collections

Artículos

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Full item page

Visual Object Detection with DETR to Support Video-Diagnosis Using Conference Tools

Files

Identifiers

Publication date

Reading date

Authors

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Share

Export

Research Projects

Organizational Units

Journal Issue

Center

Department/Institute

Keywords

Abstract

Description

Bibliographic citation

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license