Large Language Models for Biomedical Entity Recognition and Normalization in Low-Resource Languages and Settings

Loading...
Thumbnail Image

Identifiers

Publication date

Reading date

2025-07-11

Authors

Gallego Donoso, Fernando

Collaborators

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

UMA Editorial

Metrics

Google Scholar

Share

Research Projects

Organizational Units

Journal Issue

Abstract

This PhD thesis focuses on the development of advanced natural language processing (NLP) solutions in the clinical domain, addressing the challenges posed by the high linguistic and structural variability of electronic health records. The rise of artificial intelligence (AI) and greater access to computational resources have enabled the analysis of large volumes of clinical texts, allowing for more precise and efficient extraction, normalization, and linking of biomedical entities. Terminological complexity, the presence of synonyms, abbreviations, and typographical errors, as well as the heterogeneity of information sources, require robust techniques such as transfer learning and continuous model adaptation. These methodologies enhance model generalization in contexts characterized by high uncertainty, data scarcity¿such as rare diseases¿and low-resource languages, including Spanish and other co-official languages. Furthermore, the integration of structured and unstructured sources demands adaptive and versatile solutions. This research proposes an innovative approach based on large language models (LLMs) and generative techniques, improving the extraction, normalization, and semantic linking of biomedical entities in clinical records. The developed strategies have surpassed previous state-of-the-art performance in named entity recognition (NER) and normalization (MEL), achieving top-25 accuracy above 75% on the main biomedical corpora. The results, supported by comparative studies and the publication of six scientific articles, demonstrate the impact of these technologies on optimizing clinical data analysis and lay the groundwork for future applications that will contribute to the improvement of healthcare and the advancement of biomedical NLP.f

Description

Bibliographic citation

Collections

Endorsement

Review

Supplemented By

Referenced by

Creative Commons license

Except where otherwised noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 Internacional