Detection of tumor morphology mentions in clinical reports in Spanish using transformers

López-García, Guillermo; Jerez-Aragonés, José Manuel; Ribelles, Nuria; Alba-Conejo, Emilio; Veredas-Navarro, Francisco Javier

Detection of tumor morphology mentions in clinical reports in Spanish using transformers

Files

IWANN_2021.pdf (251.93 KB)

Identifiers

URI: https://hdl.handle.net/10630/22685

Publication date

2021

Authors

López-García, Guillermo

Jerez-Aragonés, José Manuel

Ribelles, Nuria

Alba-Conejo, Emilio

Veredas-Navarro, Francisco Javier

Metrics

Share

Export

Center

E.T.S.I. Informática

Department/Institute

Lenguajes y Ciencias de la Computación

Keywords

Oncología

Abstract

The aim of this study is to systematically examine the performance of transformer-based models for the detection of tumor morphology mentions in clinical documents in Spanish. For this purpose, we analyzed 3 transformer models supporting the Spanish language, namely multilingual BERT, BETO and XLM-RoBERTa. By means of a transfer- learning-based approach, the models were first pretrained on a collection of real-world oncology clinical cases with the goal of adapting trans- formers to the distinctive features of the Spanish oncology domain. The resulting models were further fine-tuned on the Cantemist-NER task, addressing the detection of tumor morphology mentions as a multi-class sequence-labeling problem. To evaluate the effectiveness of the proposed approach, we compared the obtained results by the domain-specific ver- sion of the examined transformers with the performance achieved by the general-domain version of the models. The results obtained in this pa- per empirically demonstrated that, for every analyzed transformer, the clinical version outperformed the corresponding general-domain model on the detection of tumor morphology mentions in clinical case reports in Spanish. Additionally, the combination of the transfer-learning-based approach with an ensemble strategy exploiting the predictive capabilities of the distinct transformer architectures yielded the best obtained results, achieving a precision value of 0.893, a recall of 0.887 and an F1-score of 0.89, which remarkably surpassed the prior state-of-the-art performance for the Cantemist-NER task.

Collections

Ponencias, Comunicaciones a congresos y Pósteres

Full item page

Detection of tumor morphology mentions in clinical reports in Spanish using transformers

Files

Identifiers

Publication date

Reading date

Authors

Collaborators

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Share

Export

Research Projects

Organizational Units

Journal Issue

Center

Department/Institute

Keywords

Abstract

Description

Bibliographic citation

Research data

Collections

Endorsement

Review

Supplemented By

Referenced by