Pruning dominated policies in multiobjective Pareto Q-learning
-
Fecha
2019-10-18 -
Palabras clave
Aprendizaje -
Resumen
The solution for a Multi-Objetive Reinforcement Learning problem is a set of Pareto optimal policies. MPQ-learning is a recent algorithm that approximates the whole set of all Pareto-optimal deterministic policies by directly generalizing Q-learning to the multiobjective setting. In this paper we present a modification of MPQ-learning that avoids useless cyclical policies and thus improves the number of training steps required for convergence.