Large Language Models for Biomedical Entity Recognition and Normalization in Low-Resource Languages and Settings

dc.centroE.T.S.I. Informáticaes_ES
dc.contributor.advisorVeredas-Navarro, Francisco Javier
dc.contributor.authorGallego Donoso, Fernando
dc.date.accessioned2025-09-11T10:14:17Z
dc.date.available2025-09-11T10:14:17Z
dc.date.created2025
dc.date.issued2025
dc.date.submitted2025-07-11
dc.departamentoLenguajes y Ciencias de la Computaciónes_ES
dc.description.abstractThis PhD thesis focuses on the development of advanced natural language processing (NLP) solutions in the clinical domain, addressing the challenges posed by the high linguistic and structural variability of electronic health records. The rise of artificial intelligence (AI) and greater access to computational resources have enabled the analysis of large volumes of clinical texts, allowing for more precise and efficient extraction, normalization, and linking of biomedical entities. Terminological complexity, the presence of synonyms, abbreviations, and typographical errors, as well as the heterogeneity of information sources, require robust techniques such as transfer learning and continuous model adaptation. These methodologies enhance model generalization in contexts characterized by high uncertainty, data scarcity¿such as rare diseases¿and low-resource languages, including Spanish and other co-official languages. Furthermore, the integration of structured and unstructured sources demands adaptive and versatile solutions. This research proposes an innovative approach based on large language models (LLMs) and generative techniques, improving the extraction, normalization, and semantic linking of biomedical entities in clinical records. The developed strategies have surpassed previous state-of-the-art performance in named entity recognition (NER) and normalization (MEL), achieving top-25 accuracy above 75% on the main biomedical corpora. The results, supported by comparative studies and the publication of six scientific articles, demonstrate the impact of these technologies on optimizing clinical data analysis and lay the groundwork for future applications that will contribute to the improvement of healthcare and the advancement of biomedical NLP.fes_ES
dc.identifier.urihttps://hdl.handle.net/10630/39855
dc.language.isoenges_ES
dc.publisherUMA Editoriales_ES
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectBiomedicina - Proceso de datos - Tesis doctoraleses_ES
dc.subjectProceso en lenguaje natural (Informática)es_ES
dc.subjectModelos lingüísticoses_ES
dc.subject.otherInteligencia Artificiales_ES
dc.subject.otherInformáticaes_ES
dc.subject.otherSector de la saludes_ES
dc.subject.otherLingüística computacionales_ES
dc.titleLarge Language Models for Biomedical Entity Recognition and Normalization in Low-Resource Languages and Settingses_ES
dc.title.alternativeModelos masivos de lenguaje para el reconocimiento y la normalizaci´on de entidades biom´edicas en idiomas y entornos con pocos recursoses_ES
dc.typedoctoral thesises_ES
dspace.entity.typePublication
relation.isAdvisorOfPublicationb8ab3a42-65ef-4349-9230-798e19f78426
relation.isAdvisorOfPublication.latestForDiscoveryb8ab3a42-65ef-4349-9230-798e19f78426

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
TD_GALLEGO_DONOSO_Fernando.pdf
Size:
2.37 MB
Format:
Adobe Portable Document Format
Description:

Collections