Pruning dominated policies in multiobjective Pareto Q-learning
Loading...
Identifiers
Publication date
Reading date
Collaborators
Advisors
Tutors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Share
Center
Department/Institute
Keywords
Abstract
The solution for a Multi-Objetive Reinforcement Learning problem
is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates
the whole set of all Pareto-optimal deterministic policies by directly
generalizing Q-learning to the multiobjective setting. In this paper we present a
modification of MPQ-learning that avoids useless cyclical policies and thus improves
the number of training steps required for convergence.









