ListarLCC - Artículos por tema "Aprendizaje automático (Inteligencia artificial)"
Mostrando ítems 1-1 de 1
-
A temporal difference method for multi-objective reinforcement learning
(2019-10-17)This work describes MPQ-learning, an temporal-difference method that approximates the set of all non-dominated policies in multi-objective Markov decision problems, where rewards are vectors and each component stands for ...