Background: CDK4/6 inhibitors plus endocrine therapies are the current standard
of care in the first-line treatment of HRþ/HER2-negative metastatic breast cancer, but there
are no well-established clinical or molecular predictive factors for patient response. In the era
of personalised oncology, new approaches for developing predictive models of response are
needed.
Materials and methods: Data derived from the electronic health records (EHRs) of real-world
patients with HRþ/HER2-negative advanced breast cancer were used to develop predictive
models for early and late progression to first-line treatment. Two machine learning approaches
were used: a classic approach using a data set of manually extracted features from reviewed
(EHR) patients, and a second approach using natural language processing (NLP) of freetext
clinical notes recorded during medical visits.
Results: Of the 610 patients included, there were 473 (77.5%) progressions to first-line treatment,
of which 126 (20.6%) occurred within the first 6 months. There were 152 patients
(24.9%) who showed no disease progression before 28 months from the onset of first-line treatment.
The best predictive model for early progression using the manually extracted dataset
achieved an area under the curve (AUC) of 0.734 (95% CI 0.687e0.782). Using the NLP
free-text processing approach, the best model obtained an AUC of 0.758 (95% CI 0.714
e0.800). The best model to predict long responders using manually extracted data obtained
an AUC of 0.669 (95% CI 0.608e0.730). With NLP free-text processing, the best model attained
an AUC of 0.752 (95% CI 0.705e0.799).
Conclusions: Using machine learning methods, we developed predictive models for early and
late progression to first-line treatment of HRþ/HER2-negative metastatic breast cancer, also
finding that NLP-based machine learning models are slightly better than predictive models
based on manually obtained data.