Pruning dominated policies in multiobjective Pareto Q-learning

dc.centroE.T.S.I. Informáticaen_US
dc.contributor.authorMandow-Andaluz, Lorenzo
dc.contributor.authorPérez-de-la-Cruz-Molina, José Luis
dc.date.accessioned2019-10-18T12:34:39Z
dc.date.available2019-10-18T12:34:39Z
dc.date.created2018
dc.date.issued2019-10-18
dc.departamentoLenguajes y Ciencias de la Computación
dc.description.abstractThe solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.en_US
dc.description.sponsorshipSupported by: the Spanish Government, Agencia Estatal de Investigaci´on (AEI) and European Union, Fondo Europeo de Desarrollo Regional (FEDER), grant TIN2016-80774-R (AEI/FEDER, UE); and Plan Propio de Investigación de la Universidad de Málaga - Campus de Excelencia Internacional Andalucía Tech.en_US
dc.identifier.doihttps://doi.org/10.1007/978-3-030-00374-6_23
dc.identifier.urihttps://hdl.handle.net/10630/18600
dc.language.isoengen_US
dc.rights.accessRightsopen accessen_US
dc.subjectAprendizajeen_US
dc.titlePruning dominated policies in multiobjective Pareto Q-learningen_US
dc.typejournal articlees_ES
dc.type.hasVersionSMURes_ES
dspace.entity.typePublication
relation.isAuthorOfPublicationb4b11711-73ab-4cd0-854c-8ab2735e829d
relation.isAuthorOfPublicationb7e65043-46cc-445b-8d8f-b4c7ad4f1c06
relation.isAuthorOfPublication.latestForDiscoveryb4b11711-73ab-4cd0-854c-8ab2735e829d

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
18-caepia-v3-riuma.pdf
Size:
364.81 KB
Format:
Adobe Portable Document Format
Description:

Collections