RT Journal Article T1 Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HRþ)/HER2-negative advanced breast cancer patients. A1 Ribelles, Nuria A1 Jerez-Aragonés, José Manuel A1 Rodríguez-Brazzarola, Pablo A1 Jiménez-Rodríguez, Begoña A1 Díaz-Redondo, Tamara A1 Mesa, Héctor A1 Márquez, Antonia A1 Sánchez-Muñoz, Alfonso A1 Pajares, Bella A1 Carabantes, Francisco A1 Bermejo-Pérez, María José A1 Villar, Ester A1 Domínguez-Recio, María Emilia A1 Saez-Lara, Enrique A1 Gálvez Carvajal, Laura A1 Godoy-Ortiz, Ana A1 Franco, Leónardo A1 Ruiz-Medina, Sofía A1 López, Irene A1 Alba-Conejo, Emilio K1 Mamas - Cáncer K1 Proceso en lenguaje natural (Informática) AB Background: CDK4/6 inhibitors plus endocrine therapies are the current standardof care in the first-line treatment of HRþ/HER2-negative metastatic breast cancer, but thereare no well-established clinical or molecular predictive factors for patient response. In the eraof personalised oncology, new approaches for developing predictive models of response areneeded.Materials and methods: Data derived from the electronic health records (EHRs) of real-worldpatients with HRþ/HER2-negative advanced breast cancer were used to develop predictivemodels for early and late progression to first-line treatment. Two machine learning approacheswere used: a classic approach using a data set of manually extracted features from reviewed(EHR) patients, and a second approach using natural language processing (NLP) of freetextclinical notes recorded during medical visits.Results: Of the 610 patients included, there were 473 (77.5%) progressions to first-line treatment,of which 126 (20.6%) occurred within the first 6 months. There were 152 patients(24.9%) who showed no disease progression before 28 months from the onset of first-line treatment.The best predictive model for early progression using the manually extracted datasetachieved an area under the curve (AUC) of 0.734 (95% CI 0.687e0.782). Using the NLPfree-text processing approach, the best model obtained an AUC of 0.758 (95% CI 0.714e0.800). The best model to predict long responders using manually extracted data obtainedan AUC of 0.669 (95% CI 0.608e0.730). With NLP free-text processing, the best model attainedan AUC of 0.752 (95% CI 0.705e0.799).Conclusions: Using machine learning methods, we developed predictive models for early andlate progression to first-line treatment of HRþ/HER2-negative metastatic breast cancer, alsofinding that NLP-based machine learning models are slightly better than predictive modelsbased on manually obtained data. PB Elsevier YR 2021 FD 2021 LK https://hdl.handle.net/10630/31244 UL https://hdl.handle.net/10630/31244 LA eng NO Ribelles N, Jerez JM, Rodriguez-Brazzarola P, Jimenez B, Diaz-Redondo T, Mesa H, Marquez A, Sanchez-Muñoz A, Pajares B, Carabantes F, Bermejo MJ, Villar E, Dominguez-Recio ME, Saez E, Galvez L, Godoy A, Franco L, Ruiz-Medina S, Lopez I, Alba E. Machine learning and natural language processing (NLP) approach to predict early progression to first-line treatment in real-world hormone receptor-positive (HR+)/HER2-negative advanced breast cancer patients. Eur J Cancer. 2021 Feb;144:224-231. doi: 10.1016/j.ejca.2020.11.030. Epub 2020 Dec 26. PMID: 33373867. NO Este artículo ha sido publicado en la revista European Journal of Cancer. Esta versión tiene Licencia Creative Commons CC-BY-NC-ND DS RIUMA. Repositorio Institucional de la Universidad de Málaga RD 22 ene 2026