Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers

dc.centroE.T.S.I. Informáticaes_ES
dc.contributor.authorGallego Donoso, Fernando
dc.contributor.authorVeredas-Navarro, Francisco Javier
dc.date.accessioned2024-09-13T07:36:22Z
dc.date.available2024-09-13T07:36:22Z
dc.date.issued2024-08-28
dc.departamentoLenguajes y Ciencias de la Computación
dc.description.abstractDue to the scarcity of available annotations in the biomedical domain, clinical natural language processing poses a substantial challenge, espe- cially when applied to low-resource languages. This paper presents our contributions for the detection and normalization of clinical entities corresponding to symptoms, signs, and findings present in multilingual clinical texts. For this purpose, the three subtasks proposed in the SympTEMIST shared task of the Biocreative VIII conference have been addressed. For Subtask 1—named entity recognition in a Spanish corpus—an approach focused on BERT-based model assemblies pretrained on a proprietary oncology corpus was followed. Subtasks 2 and 3 of SympTEMIST address named entity linking (NEL) in Spanish and multilingual corpora, respectively. Our approach to these subtasks followed a classification strategy that starts from a bi-encoder trained by contrastive learning, for which several SapBERT-like models are explored. To apply this NEL approach to different languages, we have trained these models by leveraging the knowledge base of domain-specific medical concepts in Spanish supplied by the organizers, which we have translated into the other languages of interest by using machine translation tools.es_ES
dc.description.sponsorshipThe authors acknowledge the support from the Ministerio de Ciencia e Innovación (MICINN) under project AEI/10.13039/501100011033. This work is also supported by the University of Malaga/CBUA funding for open access charge.es_ES
dc.identifier.citationFernando Gallego, Francisco J Veredas, Recognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layers, Database, Volume 2024, 2024, baae087, https://doi.org/10.1093/database/baae087es_ES
dc.identifier.doi10.1093/database/baae087
dc.identifier.urihttps://hdl.handle.net/10630/32547
dc.language.isoenges_ES
dc.rightsAtribución 4.0 Internacional*
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectInformática - Saludes_ES
dc.subject.otherModelos BERTes_ES
dc.subject.otherRegistros de salud electrónicoses_ES
dc.titleRecognition and normalization of multilingual symptom entities using in-domain-adapted BERT models and classification layerses_ES
dc.typejournal articlees_ES
dc.type.hasVersionVoRes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationb8ab3a42-65ef-4349-9230-798e19f78426
relation.isAuthorOfPublication.latestForDiscoveryb8ab3a42-65ef-4349-9230-798e19f78426

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Recognition_baae087.pdf
Size:
10.62 MB
Format:
Adobe Portable Document Format
Description:

Collections