Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers

Gallego Donoso, Fernando; Veredas-Navarro, Francisco Javier

doi:10.1093/database/baae087

Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers

Files

Recognition_baae087.pdf (10.62 MB)

Identifiers

URI: https://hdl.handle.net/10630/32547

DOI: 10.1093/database/baae087

Publication date

2024-08-28

Authors

Gallego Donoso, Fernando

Veredas-Navarro, Francisco Javier

Metrics

Share

Export

Center

E.T.S.I. Informática

Department/Institute

Lenguajes y Ciencias de la Computación

Keywords

Informática - Salud

Abstract

Due to the scarcity of available annotations in the biomedical domain, clinical natural language processing poses a substantial challenge, espe- cially when applied to low-resource languages. This paper presents our contributions for the detection and normalization of clinical entities corresponding to symptoms, signs, and findings present in multilingual clinical texts. For this purpose, the three subtasks proposed in the SympTEMIST shared task of the Biocreative VIII conference have been addressed. For Subtask 1—named entity recognition in a Spanish corpus—an approach focused on BERT-based model assemblies pretrained on a proprietary oncology corpus was followed. Subtasks 2 and 3 of SympTEMIST address named entity linking (NEL) in Spanish and multilingual corpora, respectively. Our approach to these subtasks followed a classification strategy that starts from a bi-encoder trained by contrastive learning, for which several SapBERT-like models are explored. To apply this NEL approach to different languages, we have trained these models by leveraging the knowledge base of domain-specific medical concepts in Spanish supplied by the organizers, which we have translated into the other languages of interest by using machine translation tools.

Bibliographic citation

Fernando Gallego, Francisco J Veredas, Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers, Database, Volume 2024, 2024, baae087, https://doi.org/10.1093/database/baae087

Collections

Artículos

Creative Commons license

Except where otherwised noted, this item's license is described as Atribución 4.0 Internacional

Full item page

Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers

Files

Identifiers

Publication date

Reading date

Authors

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Share

Export

Research Projects

Organizational Units

Journal Issue

Center

Department/Institute

Keywords

Abstract

Description

Bibliographic citation

Research data

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license