RT Journal Article T1 Pruning dominated policies in multiobjective Pareto Q-learning A1 Mandow-Andaluz, Lorenzo A1 Pérez-de-la-Cruz-Molina, José Luis K1 Aprendizaje AB The solution for a Multi-Objetive Reinforcement Learning problemis a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximatesthe whole set of all Pareto-optimal deterministic policies by directlygeneralizing Q-learning to the multiobjective setting. In this paper we present amodification of MPQ-learning that avoids useless cyclical policies and thus improvesthe number of training steps required for convergence. YR 2019 FD 2019-10-18 LK https://hdl.handle.net/10630/18600 UL https://hdl.handle.net/10630/18600 LA eng NO Supported by: the Spanish Government, Agencia Estatal de Investigaci´on (AEI) and EuropeanUnion, Fondo Europeo de Desarrollo Regional (FEDER), grant TIN2016-80774-R(AEI/FEDER, UE); and Plan Propio de Investigación de la Universidad de Málaga - Campusde Excelencia Internacional Andalucía Tech. DS RIUMA. Repositorio Institucional de la Universidad de Málaga RD 20 ene 2026