Named Entity Recognition for De-identifying Real-World Health Records in Spanish.

dc.centroE.T.S.I. Informáticaes_ES
dc.contributor.authorLópez-García, Guillermo
dc.contributor.authorMoreno-Barea, Francisco J.
dc.contributor.authorMesa, Héctor
dc.contributor.authorJerez-Aragonés, José Manuel
dc.contributor.authorRibelles, Nuria
dc.contributor.authorAlba-Conejo, Emilio
dc.contributor.authorVeredas-Navarro, Francisco Javier
dc.date.accessioned2023-07-20T09:31:34Z
dc.date.available2023-07-20T09:31:34Z
dc.date.created2023-07-03
dc.date.issued2023
dc.departamentoLenguajes y Ciencias de la Computación
dc.description.abstractA growing and renewed interest has emerged in Electronic Health Records (EHRs) as a source of information for decision-making in clinical practice. In this context, the automatic de-identification of EHRs constitutes an essential task, since their dissociation from personal data is a mandatory first step before their distribution. However, the majority of previous studies on this subject have been conducted on English EHRs, due to the limited availability of annotated corpora in other languages, such as Spanish. In this study, we addressed the automatic de-identification of medical documents in Spanish. A private corpus of 599 real-world clinical cases have been annotated with 8 different protected health information categories. We have tackled the predictive problem as a named entity recognition task, developing two different deep learning-based methodologies, namely a first strategy based on recurrent neural networks (RNN) and an end-to-end approach based on transformers. Additionally, we have developed a data augmentation procedure to increase the number of texts used to train the models. The results obtained show that transformers outperform RNN on the de-identification of Spanish clinical data. In particular, the best performance was obtained by the XLM-RoBERTa large transformer, with a strict-match micro-averaged value of 0.946 for precision, 0.954 for recall and 0.95 for F1-score, when trained on the augmented version of the corpus. The performance achieved by transformers in this study proves the viability of applying these state-of-the-art models in real-world clinical scenarios.es_ES
dc.description.sponsorshipThe authors acknowledge the support from the Ministerio de Economía y Empresa (MINECO) through grant TIN2017-88728-C2-1-R, from the Ministerio de Ciencia e Innovación (MICINN) under project PID2020-116898RB-I00, from the Universidad de Málaga and Junta de Andalucía through grant UMA20-FEDERJA-045, from the Malaga-Pfizer consortium for AI research in Cancer - MAPIC, from the Instituto de Investigación Biomédica de Málaga - IBIMA (all including FEDER funds) and from Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech.es_ES
dc.identifier.citationLópez-García, G. et al. (2023). Named Entity Recognition for De-identifying Real-World Health Records in Spanish. In: Mikyška, J., de Mulatier, C., Paszynski, M., Krzhizhanovskaya, V.V., Dongarra, J.J., Sloot, P.M. (eds) Computational Science – ICCS 2023. ICCS 2023. Lecture Notes in Computer Science, vol 10475. Springer, Cham. https://doi.org/10.1007/978-3-031-36024-4_17es_ES
dc.identifier.urihttps://hdl.handle.net/10630/27313
dc.language.isoenges_ES
dc.publisherSpringer Naturees_ES
dc.relation.eventdate03/072023-05/07/2023es_ES
dc.relation.eventplacePraga, República Checaes_ES
dc.relation.eventtitleInternational Conference on Computational Science (ICCS 2023)es_ES
dc.rights.accessRightsopen accesses_ES
dc.subjectHistorias clinicas - Control de accesoes_ES
dc.subjectDatos - Protecciónes_ES
dc.subjectProceso en lenguaje natural (Informática)es_ES
dc.subject.otherNamed Entity Recognitiones_ES
dc.subject.otherNatural Language Processinges_ES
dc.subject.otherElectronic Health Recordses_ES
dc.subject.otherDe-Identificationes_ES
dc.titleNamed Entity Recognition for De-identifying Real-World Health Records in Spanish.es_ES
dc.typeconference outputes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationb6f27291-58a9-4408-860c-12508516ff67
relation.isAuthorOfPublication1e58df71-b337-4856-a5e8-02f8c2e8792b
relation.isAuthorOfPublicationb8ab3a42-65ef-4349-9230-798e19f78426
relation.isAuthorOfPublication.latestForDiscoveryb6f27291-58a9-4408-860c-12508516ff67

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
López-García et al. 2023 - Named Entity Recognition for De-identifying Real-World Health Records in Spanish.pdf
Size:
669.68 KB
Format:
Adobe Portable Document Format
Description: