Detection of tumor morphology mentions in clinical reports in Spanish using transformers

dc.centroE.T.S.I. Informáticaes_ES
dc.contributor.authorLópez-García, Guillermo
dc.contributor.authorJerez-Aragonés, José Manuel
dc.contributor.authorRibelles, Nuria
dc.contributor.authorAlba-Conejo, Emilio
dc.contributor.authorVeredas-Navarro, Francisco Javier
dc.date.accessioned2021-07-23T06:19:22Z
dc.date.available2021-07-23T06:19:22Z
dc.date.created2021-07-22
dc.date.issued2021
dc.departamentoLenguajes y Ciencias de la Computación
dc.description.abstractThe aim of this study is to systematically examine the performance of transformer-based models for the detection of tumor morphology mentions in clinical documents in Spanish. For this purpose, we analyzed 3 transformer models supporting the Spanish language, namely multilingual BERT, BETO and XLM-RoBERTa. By means of a transfer- learning-based approach, the models were first pretrained on a collection of real-world oncology clinical cases with the goal of adapting trans- formers to the distinctive features of the Spanish oncology domain. The resulting models were further fine-tuned on the Cantemist-NER task, addressing the detection of tumor morphology mentions as a multi-class sequence-labeling problem. To evaluate the effectiveness of the proposed approach, we compared the obtained results by the domain-specific ver- sion of the examined transformers with the performance achieved by the general-domain version of the models. The results obtained in this pa- per empirically demonstrated that, for every analyzed transformer, the clinical version outperformed the corresponding general-domain model on the detection of tumor morphology mentions in clinical case reports in Spanish. Additionally, the combination of the transfer-learning-based approach with an ensemble strategy exploiting the predictive capabilities of the distinct transformer architectures yielded the best obtained results, achieving a precision value of 0.893, a recall of 0.887 and an F1-score of 0.89, which remarkably surpassed the prior state-of-the-art performance for the Cantemist-NER task.es_ES
dc.identifier.urihttps://hdl.handle.net/10630/22685
dc.language.isoenges_ES
dc.relation.eventdateJunio, 2021es_ES
dc.relation.eventplaceMadeira, Portugales_ES
dc.relation.eventtitleInternational Work Conference on Artificial Neural Networks (IWANN 2021)es_ES
dc.rights.accessRightsopen accesses_ES
dc.subjectOncologíaes_ES
dc.subject.otherClinical codinges_ES
dc.subject.otherDeep learninges_ES
dc.subject.otherNatural Language Processinges_ES
dc.subject.otherText classificationes_ES
dc.subject.otherTransformerses_ES
dc.titleDetection of tumor morphology mentions in clinical reports in Spanish using transformerses_ES
dc.typeconference outputes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationb6f27291-58a9-4408-860c-12508516ff67
relation.isAuthorOfPublication1e58df71-b337-4856-a5e8-02f8c2e8792b
relation.isAuthorOfPublicationb8ab3a42-65ef-4349-9230-798e19f78426
relation.isAuthorOfPublication.latestForDiscoveryb6f27291-58a9-4408-860c-12508516ff67

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
IWANN_2021.pdf
Size:
251.93 KB
Format:
Adobe Portable Document Format
Description: