RT Journal Article T1 A temporal difference method for multi-objective reinforcement learning A1 Ruiz-Montiel, Manuela A1 Mandow-Andaluz, Lorenzo A1 Pérez-de-la-Cruz-Molina, José Luis K1 Aprendizaje automático (Inteligencia artificial) AB This work describes MPQ-learning, an temporal-difference method that approximates the set of all non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for an objective to maximize. Unlike other approximations to Multi-objective Reinforcement Learning, MPQ-learning does not require additional parameters or preference information, and can be applied to non-convex Pareto frontiers. We also present the results of the application of MPQ-learning to some benchmark problems and compare it to a linearization procedure. YR 2019 FD 2019-10-17 LK https://hdl.handle.net/10630/18596 UL https://hdl.handle.net/10630/18596 LA eng NO This work is partially funded by grants TIN2009-14179 (Spanish Government, Plan Nacional de I+D+i) and TIN2016-80774-R (AEI/FEDER, UE) (Spanish Government, Agencia Estatal de Investigación; and European Union, Fondo Europeo de Desarrollo Regional). Manuela Ruiz-Montiel is funded by the Spanish Ministry of Education through the National F.P.U. Program. DS RIUMA. Repositorio Institucional de la Universidad de Málaga RD 21 ene 2026