Mostrar el registro sencillo del ítem

dc.contributor.authorMandow-Andaluz, Lorenzo 
dc.contributor.authorPerez-de-la-Cruz-Molina, Jose Luis 
dc.date.accessioned2019-10-18T12:34:39Z
dc.date.available2019-10-18T12:34:39Z
dc.date.created2018
dc.date.issued2019-10-18
dc.identifier.urihttps://hdl.handle.net/10630/18600
dc.description.abstractThe solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.en_US
dc.description.sponsorshipSupported by: the Spanish Government, Agencia Estatal de Investigaci´on (AEI) and European Union, Fondo Europeo de Desarrollo Regional (FEDER), grant TIN2016-80774-R (AEI/FEDER, UE); and Plan Propio de Investigación de la Universidad de Málaga - Campus de Excelencia Internacional Andalucía Tech.en_US
dc.language.isoengen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectAprendizajeen_US
dc.titlePruning dominated policies in multiobjective Pareto Q-learningen_US
dc.typeinfo:eu-repo/semantics/preprinten_US
dc.centroE.T.S.I. Informáticaen_US
dc.identifier.doihttps://doi.org/10.1007/978-3-030-00374-6_23


Ficheros en el ítem

Este ítem aparece en la(s) siguiente(s) colección(ones)

Mostrar el registro sencillo del ítem