Mostrar el registro sencillo del ítem
Pruning dominated policies in multiobjective Pareto Q-learning
dc.contributor.author | Mandow-Andaluz, Lorenzo | |
dc.contributor.author | Pérez-de-la-Cruz-Molina, José Luis | |
dc.date.accessioned | 2019-10-18T12:34:39Z | |
dc.date.available | 2019-10-18T12:34:39Z | |
dc.date.created | 2018 | |
dc.date.issued | 2019-10-18 | |
dc.identifier.uri | https://hdl.handle.net/10630/18600 | |
dc.description.abstract | The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence. | en_US |
dc.description.sponsorship | Supported by: the Spanish Government, Agencia Estatal de Investigaci´on (AEI) and European Union, Fondo Europeo de Desarrollo Regional (FEDER), grant TIN2016-80774-R (AEI/FEDER, UE); and Plan Propio de Investigación de la Universidad de Málaga - Campus de Excelencia Internacional Andalucía Tech. | en_US |
dc.language.iso | eng | en_US |
dc.rights | info:eu-repo/semantics/openAccess | en_US |
dc.subject | Aprendizaje | en_US |
dc.title | Pruning dominated policies in multiobjective Pareto Q-learning | en_US |
dc.type | info:eu-repo/semantics/article | es_ES |
dc.centro | E.T.S.I. Informática | en_US |
dc.identifier.doi | https://doi.org/10.1007/978-3-030-00374-6_23 | |
dc.type.hasVersion | info:eu-repo/semantics/submittedVersion | es_ES |