JavaScript is disabled for your browser. Some features of this site may not work without it.

    Listar

    Todo RIUMAComunidades & ColeccionesPor fecha de publicaciónAutoresTítulosMateriasTipo de publicaciónCentrosDepartamentos/InstitutosEditoresEsta colecciónPor fecha de publicaciónAutoresTítulosMateriasTipo de publicaciónCentrosDepartamentos/InstitutosEditores

    Mi cuenta

    AccederRegistro

    Estadísticas

    Ver Estadísticas de uso

    DE INTERÉS

    Datos de investigaciónReglamento de ciencia abierta de la UMAPolítica de RIUMAPolitica de datos de investigación en RIUMAOpen Policy Finder (antes Sherpa-Romeo)Dulcinea
    Preguntas frecuentesManual de usoContacto/Sugerencias
    Ver ítem 
    •   RIUMA Principal
    • Investigación
    • Artículos
    • Ver ítem
    •   RIUMA Principal
    • Investigación
    • Artículos
    • Ver ítem

    Named entity recognition for de-identifying Spanish electronic health records

    • Autor
      Moreno-Barea, Francisco J.; López-García, Guillermo; Mesa, Héctor; Ribelles, Nuria; Alba-Conejo, EmilioAutoridad Universidad de Málaga; Jerez-Aragonés, José ManuelAutoridad Universidad de Málaga; Veredas-Navarro, Francisco JavierAutoridad Universidad de Málaga
    • Fecha
      2025-02
    • Editorial/Editor
      Elsevier
    • Palabras clave
      Registros médicos - Proceso de datos - Español
    • Resumen
      Background and objectives: There is an increasing and renewed interest in Electronic Health Records (EHRs) as a substantial information source for clinical decision making. Consequently, automatic de-identification of EHRs is an indispensable task, since their dissociation from personal data is a necessary prerequisite for their dissemination. Nevertheless, the bulk of prior research in this domain has been conducted using English EHRs, given the limited availability of annotated corpora in other languages, including Spanish. Methods: In this study, the automatic de-identification of medical documents in Spanish was explored. A private corpus comprising 599 genuine clinical cases was annotated with eight different categories of protected health information. The prediction problem was approached as a named entity recognition task and two deep learning-based methodologies were developed. The first strategy was based on recurrent neural networks (RNN) and the second, an end-to-end approach, was based on Transformers. In addition, we have implemented a procedure to expand the amount of texts employed for model training. Results: Our findings demonstrate that Transformers surpass RNNs in the de-identification of clinical data in Spanish. Particularly noteworthy is the excellent performance of the XLM-RoBERTa large Transformer, achieving a rigorous strict-match micro-average of 0.946 for precision, 0.954 for recall, and an F1 score of 0.95 when applied to the amplified version of the corpus. Furthermore, a web-based application has been created to assist specialized clinicians in de-identifying EHRs through the aid of the implemented models. Conclusion: The study’s conclusions showcase the practical applicability of the state-of-the-art Transformers models for precise de-identification of clinical notes in real-world medical settings in Spanish, with the potential to improve performance if continual pre-training strategies are implemented.
    • URI
      https://hdl.handle.net/10630/35969
    • DOI
      https://dx.doi.org/10.1016/j.compbiomed.2024.109576
    • Compartir
      RefworksMendeley
    Mostrar el registro completo del ítem
    Ficheros
    1-s2.0-S0010482524016615-main.pdf (2.924Mb)
    Colecciones
    • Artículos

    Estadísticas

    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
     

     

    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA