Named entity recognition for de-identifying Spanish electronic health records

dc.centroE.T.S.I. Informáticaes_ES
dc.contributor.authorMoreno-Barea, Francisco J.
dc.contributor.authorLópez-García, Guillermo
dc.contributor.authorMesa, Héctor
dc.contributor.authorRibelles, Nuria
dc.contributor.authorAlba-Conejo, Emilio
dc.contributor.authorJerez-Aragonés, José Manuel
dc.contributor.authorVeredas-Navarro, Francisco Javier
dc.date.accessioned2025-01-08T10:57:41Z
dc.date.available2025-01-08T10:57:41Z
dc.date.issued2025-02
dc.departamentoLenguajes y Ciencias de la Computación
dc.description.abstractBackground and objectives: There is an increasing and renewed interest in Electronic Health Records (EHRs) as a substantial information source for clinical decision making. Consequently, automatic de-identification of EHRs is an indispensable task, since their dissociation from personal data is a necessary prerequisite for their dissemination. Nevertheless, the bulk of prior research in this domain has been conducted using English EHRs, given the limited availability of annotated corpora in other languages, including Spanish. Methods: In this study, the automatic de-identification of medical documents in Spanish was explored. A private corpus comprising 599 genuine clinical cases was annotated with eight different categories of protected health information. The prediction problem was approached as a named entity recognition task and two deep learning-based methodologies were developed. The first strategy was based on recurrent neural networks (RNN) and the second, an end-to-end approach, was based on Transformers. In addition, we have implemented a procedure to expand the amount of texts employed for model training. Results: Our findings demonstrate that Transformers surpass RNNs in the de-identification of clinical data in Spanish. Particularly noteworthy is the excellent performance of the XLM-RoBERTa large Transformer, achieving a rigorous strict-match micro-average of 0.946 for precision, 0.954 for recall, and an F1 score of 0.95 when applied to the amplified version of the corpus. Furthermore, a web-based application has been created to assist specialized clinicians in de-identifying EHRs through the aid of the implemented models. Conclusion: The study’s conclusions showcase the practical applicability of the state-of-the-art Transformers models for precise de-identification of clinical notes in real-world medical settings in Spanish, with the potential to improve performance if continual pre-training strategies are implemented.es_ES
dc.description.sponsorshipFunding for open access charge: Universidad de Málaga / CBUA. The authors acknowledge the support from the Ministerio de Ciencia e Innovación (MICINN) under project PID2020-116898RB-I00, from the Universidad de Málaga and Junta de Andalucía through grant UMA20-FEDERJA-045, from Pfizer S.L., the University of Malaga and the Fundación General UMA (UMA-FGUMA-Pfizer) through private funds.es_ES
dc.identifier.citationFrancisco J. Moreno-Barea, Guillermo López-García, Héctor Mesa, Nuria Ribelles, Emilio Alba, José M. Jerez, Francisco J. Veredas, Named entity recognition for de-identifying Spanish electronic health records, Computers in Biology and Medicine, Volume 185, 2025, 109576, ISSN 0010-4825, https://doi.org/10.1016/j.compbiomed.2024.109576.es_ES
dc.identifier.doi10.1016/j.compbiomed.2024.109576
dc.identifier.urihttps://hdl.handle.net/10630/35969
dc.language.isoenges_ES
dc.publisherElsevieres_ES
dc.rightsAtribución 4.0 Internacional*
dc.rights.accessRightsopen accesses_ES
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectRegistros médicos - Proceso de datos - Españoles_ES
dc.subject.otherNamed entity recognitiones_ES
dc.subject.otherNatural language processinges_ES
dc.subject.otherDe-identificationes_ES
dc.subject.otherElectronic health recordses_ES
dc.subject.otherSpanishes_ES
dc.titleNamed entity recognition for de-identifying Spanish electronic health recordses_ES
dc.typejournal articlees_ES
dc.type.hasVersionVoRes_ES
dspace.entity.typePublication
relation.isAuthorOfPublication1e58df71-b337-4856-a5e8-02f8c2e8792b
relation.isAuthorOfPublicationb6f27291-58a9-4408-860c-12508516ff67
relation.isAuthorOfPublicationb8ab3a42-65ef-4349-9230-798e19f78426
relation.isAuthorOfPublication.latestForDiscovery1e58df71-b337-4856-a5e8-02f8c2e8792b

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
1-s2.0-S0010482524016615-main.pdf
Size:
2.92 MB
Format:
Adobe Portable Document Format
Description:

Collections