JavaScript is disabled for your browser. Some features of this site may not work without it.

    Listar

    Todo RIUMAComunidades & ColeccionesPor fecha de publicaciónAutoresTítulosMateriasTipo de publicaciónCentrosDepartamentos/InstitutosEditoresEsta colecciónPor fecha de publicaciónAutoresTítulosMateriasTipo de publicaciónCentrosDepartamentos/InstitutosEditores

    Mi cuenta

    AccederRegistro

    Estadísticas

    Ver Estadísticas de uso

    DE INTERÉS

    Datos de investigaciónReglamento de ciencia abierta de la UMAPolítica de RIUMAPolitica de datos de investigación en RIUMAOpen Policy Finder (antes Sherpa-Romeo)Dulcinea
    Preguntas frecuentesManual de usoContacto/Sugerencias
    Ver ítem 
    •   RIUMA Principal
    • Investigación
    • Ponencias, Comunicaciones a congresos y Pósteres
    • Ver ítem
    •   RIUMA Principal
    • Investigación
    • Ponencias, Comunicaciones a congresos y Pósteres
    • Ver ítem

    Named Entity Recognition for De-identifying Real-World Health Records in Spanish.

    • Autor
      López-García, Guillermo; Moreno-Barea, Francisco J.; Mesa, Héctor; Jerez-Aragonés, José ManuelAutoridad Universidad de Málaga; Ribelles, Nuria; Alba-Conejo, EmilioAutoridad Universidad de Málaga; Veredas-Navarro, Francisco JavierAutoridad Universidad de Málaga
    • Fecha
      2023
    • Editorial/Editor
      Springer Nature
    • Palabras clave
      Historias clinicas - Control de acceso; Datos - Protección; Proceso en lenguaje natural (Informática)
    • Resumen
      A growing and renewed interest has emerged in Electronic Health Records (EHRs) as a source of information for decision-making in clinical practice. In this context, the automatic de-identification of EHRs constitutes an essential task, since their dissociation from personal data is a mandatory first step before their distribution. However, the majority of previous studies on this subject have been conducted on English EHRs, due to the limited availability of annotated corpora in other languages, such as Spanish. In this study, we addressed the automatic de-identification of medical documents in Spanish. A private corpus of 599 real-world clinical cases have been annotated with 8 different protected health information categories. We have tackled the predictive problem as a named entity recognition task, developing two different deep learning-based methodologies, namely a first strategy based on recurrent neural networks (RNN) and an end-to-end approach based on transformers. Additionally, we have developed a data augmentation procedure to increase the number of texts used to train the models. The results obtained show that transformers outperform RNN on the de-identification of Spanish clinical data. In particular, the best performance was obtained by the XLM-RoBERTa large transformer, with a strict-match micro-averaged value of 0.946 for precision, 0.954 for recall and 0.95 for F1-score, when trained on the augmented version of the corpus. The performance achieved by transformers in this study proves the viability of applying these state-of-the-art models in real-world clinical scenarios.
    • URI
      https://hdl.handle.net/10630/27313
    • Compartir
      RefworksMendeley
    Mostrar el registro completo del ítem
    Ficheros
    López-García et al. 2023 - Named Entity Recognition for De-identifying Real-World Health Records in Spanish.pdf (669.6Kb)
    Colecciones
    • Ponencias, Comunicaciones a congresos y Pósteres

    Estadísticas

    Buscar en Dimension
    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
     

     

    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA
    REPOSITORIO INSTITUCIONAL UNIVERSIDAD DE MÁLAGA